Over a half-billion smartphones are now capable of measuring atmospheric pressure, potentially providing a global surface observing network of unprecedented density and coverage. An earlier study by the authors described an Android app, uWx, that served as a test bed for advanced quality control and bias correction strategies. To evaluate the utility and quality of the resulting smartphone pressure observations, ensemble data assimilation experiments were performed for two case studies over the Pacific Northwest. In both case studies, smartphone pressures improved the analyses and forecasts of assimilated and nonassimilated variables. In case I, which considered the passage of a front across the region, cycled smartphone pressure assimilation consistently improved 1-h forecasts of the altimeter setting, 2-m temperature, and 2-m dewpoint. During a postfrontal period, cycled smartphone pressure assimilation improved mesoscale forecasts of hourly precipitation accumulation. In case II, which considered a major coastal windstorm, cycling experiments assimilating smartphone pressures improved 10-m wind forecasts as well as the predicted track and intensity. For both cases, free-forecast experiments initialized with smartphone data produced forecast improvements extending several hours, suggesting the utility of crowdsourced smartphone pressures for short-term numerical weather prediction.
Surface pressure observations can provide information on all scales of motion, ranging from convectively produced cold pools to midlatitude cyclones. Surface pressure is a particularly valuable surface parameter, since it reflects atmospheric structure through the full depth of the atmosphere and is less influenced by exposure and representation errors than surface temperature, moisture, and wind. These characteristics have motivated interest in evaluating the potential of surface pressure observations for improving data assimilation and numerical weather prediction.
On the synoptic scale, experiments assimilating only surface pressure observations have reproduced upper-tropospheric large-scale circulations (Compo et al. 2006) and generated realistic lower- and middle-tropospheric analyses (Whitaker et al. 2004; Dirren et al. 2007). Considering mesoscale simulations, Wheatley and Stensrud (2010) noted that the hourly assimilation of altimeter setting, and, to a limited degree, altimeter tendency reduced errors in mesohigh position and intensity, resulting in improved model depiction of cold pools. Madaus et al. (2014) assimilated 3-hourly altimeter and altimeter tendency observations from a high-density network of routine airport observations (METARs) and bias-corrected mesonet observations. A monotonic decrease in domain-averaged analysis error occurred as the number of assimilated pressure observations increased.
Since surface pressure alone can constrain model initializations at the surface and aloft, and model initializations are improved as observational density and frequency increases (Anderson et al. 2005; Lei and Anderson 2014; Madaus et al. 2014), large numbers of pressure observations from smartphones possess the potential for improving numerical weather prediction. Surface pressure observations from smartphones offer unparalleled density and can be collected at high temporal frequency (McNicholas and Mass 2018). Hanson (2016), using observation system simulation experiments with synthetic smartphone pressures, concluded that if observational uncertainty could be estimated, smartphones pressures could improve model forecasts.
The development of several crowdsourcing pressure applications, such as PressureNet and WeatherSignal, facilitated the initial evaluation of smartphone pressures for analysis and numerical weather prediction. Mass and Madaus (2014) described the potential of crowdsourcing smartphone pressures for mesoscale numerical weather prediction (NWP) and provided an example of smartphone pressure assimilation for a convective event in eastern Washington State. Madaus and Mass (2017) found that assimilating hourly smartphone pressures resulted in limited improvements to altimeter setting forecasts and a small reduction in forecast skill for other surface variables, such as 2-m temperature and 2-m humidity. The limited positive impact of smartphone pressure observations appeared to result from poor data quality. Madaus and Mass (2017) did not account for sensor bias and elevation uncertainty, undermining their ability to constrain model forecasts.
The results of Madaus and Mass (2017) motivated a follow-up study (McNicholas and Mass 2018, hereafter MM2018) in which smartphone pressure observations (SPOs) were collected from an Android app (uWx; www.cmetwx.com) that allowed the evaluation of pressure collection and quality control strategies. In uWx, sources of error were reduced, and observational uncertainty was quantified. A machine learning approach predicted and corrected smartphone pressure biases in real time, resulting in marked improvements in the quality of SPOs.
In this study, we evaluate the impacts of the quality control and bias-correction strategies of MM2018 on numerical weather prediction by performing ensemble data assimilation of SPOs, with and without bias correction/quality control, for two case studies. In the first case, an intensifying surface low and trailing cold front traversed the uWx SPO network. The second case study simulated a strong, compact midlatitude cyclone that formed from the remnants of Tropical Storm Songda. In this case, operational systems misplaced the location of landfall, resulting in poor surface wind forecasts.
The remainder of this paper is organized as follows. In section 2, the two events are reviewed. Section 3 describes the design and methodology of the data assimilation/forecasting experiments for both cases. The results of the experiments are examined in sections 4 and 5, respectively. Section 6 discusses the conclusions and implications of this study.
2. Case descriptions
The two cases selected for this study reflect two important types of events in the Pacific Northwest: 1) a typical surface low and frontal passage with postfrontal precipitation and 2) a major coastal windstorm.
a. Case I
This case represents a familiar scenario for operational forecasts in the Pacific Northwest: a surface low and cold frontal passage. Figure 1 provides a synoptic overview of this case. At 1200 UTC 15 November 2016, a surface low was positioned over western Washington, with a weak pressure trough and associated cold front to the south. Aloft (500 hPa), southwesterly flow dominated the region, with a jet streak extending off the Pacific Ocean into northern Oregon. The 15-h forecast from the operational High Resolution Rapid Refresh (HRRR; Blaylock et al. 2017) overestimated the east–west pressure gradient in western Washington, with positive errors (~2 hPa) along the Oregon and southwest Washington coasts and excessively low pressure over the Cascades, eastern Washington, and west of Vancouver Island. The surface temperature errors had less structure, with the low and trough generally being modestly cooler than observed.
b. Case II
In case II, a coastal cyclone developed from the remnants of Tropical Storm Songda (Fig. 2). At 0300 UTC 16 October 2016, a deep surface low was centered over Vancouver Island beneath a negatively tilted 500-hPa trough. The tight pressure gradient around the low produced strong near-surface winds (>25–30 kt, where 1 kt ≈ 0.51 m s−1) over the waters surrounding Vancouver Island. Over the Puget Sound region, the observed near-surface wind speeds were relatively modest (10–15 kt); however, short-range forecasts from multiple operational systems such as the NOAA/NWS Global Forecasting System (GFS), the University of Washington’s WRF Model (UW-WRF), and the NOAA/NWS HRRR moved the surface low over the Olympic Peninsula, bringing gale-force wind gusts to the western Washington interior. The 15-h HRRR forecasts misplaced the location of landfall, bringing the surface low approximately 100 km too far east, with large pressure errors (too low) over southern Vancouver Island. Consequently, there were significant near-surface wind forecast errors, most notably in northwest Washington and the eastern Strait of Juan De Fuca, where the predicted winds were too strong. The potential for SPOs to constrain pressure forecasts, especially errors in the intensity and track of the surface low, motivated this case.
a. Model setup
For all ensemble data assimilation (DA) experiments, simulations were performed with the WRF Model (Skamarock et al. 2008). WRF was run with 38 vertical levels, a horizontal grid spacing of 4 km, and a domain encompassing most of the Pacific Northwest. The model domain was centered at (46°N, 122°W) and had dimensions of 1200 km × 900 km. Physics parameterizations (Table 1) reflect those used in the operational National Centers for Environmental Prediction (NCEP) HRRR model (Benjamin et al. 2016). A total of 48 WRF ensemble members were produced using the stochastic kinetic-energy backscatter scheme (SKEBS) to perturb WRF initial and boundary conditions (Berner et al. 2011). SKEBS parameter values are listed in Table 2. Initial conditions at the beginning of a 12-h spinup period were provided by the NOAA/ESRL Rapid Refresh model analysis (RAP; Benjamin et al. 2016), with hourly boundary conditions generated with RAP 1-h forecasts to emulate a real-time cycled DA system in which RAP forecasts are available approximately 1 h after nominal time.
b. Data assimilation
Assimilation experiments were conducted on the Microsoft Azure Cloud using the Data Assimilation Research Testbed (DART; Anderson et al. 2009) ensemble square root adjustment filter. Table 3 lists WRF state variables updated by DART during assimilation experiments. Spatially and temporally varying adaptive covariance inflation was employed to promote and maintain ensemble spread (Anderson et al. 2009). Sampling error correction was applied to help maintain ensemble spread and constrain sampling errors associated with limited ensemble size (Anderson 2012). Gaspari–Cohn localization was used in the horizontal, with a half-width of 500 km (Gaspari and Cohn 2006). Adaptive localization applied a threshold of 500 observations to decrease the localization cutoff in regions of dense observations (Anderson and Collins 2007). For this study, this procedure effectively reduced the localization radius for SPOs to approximately 330 km. The DART system includes quality control (QC) checks on observations to improve assimilation quality. Specifically, when the difference between an observation and the ensemble-mean estimate of that observation exceeded 3 times the ensemble spread, the observation is rejected as an outlier. Surface observations whose elevation deviated from the model elevation by more than 200 m were not assimilated.
c. Experimental design
In each case study, a control (CNTRL) ensemble was generated using the approach outlined in Fig. 3a. For the control simulations, the 48-member WRF ensemble was advanced hourly with SKEBS perturbed boundary conditions from 1-h RAP forecasts but with no DA. Conversely, in no-cycling DA experiments (Fig. 3b), the 1-h forecast (prior) from the CNTRL ensemble is updated with observations using the DART ensemble square root filter to create an analysis (a.k.a., the posterior). Since the model is not advanced from updated analyses, no-cycling DA experiments are designed to examine the impact of pressure assimilation on model analyses, given the same prior states. To examine the impact of pressure assimilation on forecasts, cycled DA is performed (Fig. 3c) wherein the model is advanced from analyses (posteriors) produced by assimilating surface pressure observations with DART.
For each case, DA experiments were performed over a 60-h period. For case I, no-cycling DA and cycling DA experiments were performed with SPOs, METARs, and mesonet surface pressure observations available from the Meteorological Assimilation Data Ingest System (MADIS) between 1200 UTC 14 November and 0000 UTC 17 November 2016. In case II, cycling DA experiments with SPOs were performed from 1200 UTC 14 October to 0000 UTC 17 October 2016. All DA experiments were verified with quality-controlled METAR observations.
In both cases, extended forecasts were initialized from the CNTRL ensemble and SPO cycled ensembles to evaluate the impact of SPO assimilation at lead times beyond 1 h. In case I, 6-h free forecasts were initialized every 6 h beginning at 1200 UTC 14 November and ending at 0000 UTC 17 November 2016. In case II, 5-h free forecasts were initialized at 2300 UTC 15 October 2016. In all free-forecast experiments, the full 48-member WRF ensemble was advanced with SKEBS perturbed boundary conditions from the RAP model.
4. Observation preprocessing
Both “corrected” and “uncorrected” SPOs are used in this study. Corrected SPOs are bias corrected and quality controlled using the approach outlined in MM2018. Uncorrected SPOs are retrieved prior to bias correction and quality control. Altimeter setting is used in all experiments, with SPOs reduced to sea level using the altimeter equation [Eq. (2)] in MM2018.
a. Observation bias correction
For both case studies, SPOs were corrected following the methodology of MM2018. Specifically, a random forest, machine-learning approach (Breiman 2001) was used with uWx data to predict and correct smartphone pressure bias. Random forests were generated using the Python Scikit-learn machine learning library (Pedregosa et al. 2011). For the first case, random forests were trained from 15 August to 9 November 2016. During and prior to the second case study, uWx was advertised to the public, resulting in a doubling of the number of uWx users from approximately 1000 to 2000. Since many SPOs collected during this case were retrieved from smartphones that had just joined the uWx network, bias correction of SPOs using past data was not possible. As a result, SPOs used in the second case study were bias corrected with random forests trained on data retrieved during the month after the event (19 October–23 November 2016).
Quality control of METAR and mesonet observations is performed within MADIS (Miller et al. 2005). Only METAR and mesonet observations that passed the first three stages of MADIS quality control were used in the DA experiments. Because DA was performed hourly, observations were binned by hour. If several observations from a specific METAR or mesonet station fell within 30 min of the hour, only the observation valid closest to the beginning of the hour was retained. This effectively reduced the observation window to 15 min for mesonet observations and 7 min for METAR observations. The same filtering was not performed for SPOs since a single smartphone can provide multiple observations, at unique locations, within a single assimilation cycle.
b. Observation uncertainty
Typically, observation error variances in data assimilation systems are set to a constant value for all altimeter setting observations (Wheatley and Stensrud 2010; Madaus et al. 2014; Madaus and Mass 2017). In this study, the error variances for METAR and mesonet altimeter setting observations were set to 1 and 1.5 hPa2, respectively. SPO error variance was calculated as the square of the sum of SPO uncertainty, derived in MM2018 and listed in Table 4. This approach was used to calculate the error variance for both uncorrected and corrected SPOs.
The distribution of corrected/uncorrected SPO error variance for case I is displayed in Fig. 4. SPO error variance is right skewed by smartphones with larger bias correction/estimation uncertainty and at locations where the local terrain variance is large. In Table 4, the various contributions to error variance are different for each smartphone. Since individual smartphones contribute both uncorrected and corrected SPOs, the error variance distribution of uncorrected and corrected SPOs is similar. This suggests that uncorrected SPO error variance is underestimated using the approach outlined above.
c. Spatial and temporal characteristics of SPOs
Figure 5a displays the locations of corrected SPOs during the entire period of case I, as well as for a single time: 1200 UTC 14 November 2016. The distribution of mesonet and verification METAR observations is displayed in Fig. 5b. SPO density from the uWx app in the Seattle, Washington, metropolitan area far exceeds that of existing networks (Fig. 5b), while in rural eastern Washington the coverage is sparser.
The number of SPOs available during cases I and II is displayed in Figs. 5c and 5d. The number of available METAR and mesonet observations is also displayed in Fig. 5c, as these observations were assimilated in case I DA experiments. In contrast to mesonets and METARs, there is a substantial diurnal variation in SPO availability. Fewer SPOs are available overnight when smartphone use is reduced and the smartphone operating system is more likely to limit background tasks such as pressure retrieval. During case I, a small fraction of uncorrected SPOs fail DART’s standard deviation checks, primarily during the day when more smartphones are in motion or located in urban areas where buildings are taller (Fig. 5b). During case II, the number of available SPOs increased as uWx was advertised to the public in the lead up to the windstorm. In case II, virtually all uncorrected SPOs passed DART’s QC checks (Fig. 5c). This reflects the large uncertainty in the track and intensity of the windstorm in case II, which increased the background ensemble spread, resulting in more lenient DART QC.
5. Case I: Data assimilation and forecast results
a. No-cycling experiments
To evaluate the impact of assimilating SPOs on model analyses, four no-cycling DA experiments were performed. The METAR and MESONET experiments evaluated the impact of assimilating METARs and the mesonet altimeter setting. The PHONE and PHONE_NOQC experiments evaluated the performance of assimilating corrected and uncorrected SPOs, respectively. In all four DA experiments, analysis errors were computed by subtracting METAR observations from the ensemble mean analysis at the locations of all METARs in the model domain.
Figure 6a displays the domain-average altimeter bias for all four DA experiments and the CNTRL. Assimilating METAR, mesonet, and corrected SPOs nearly eliminates the positive pressure bias apparent in the CNTRL. Uncorrected SPOs, many of which were likely retrieved above ground level, introduced a systematic low pressure bias in no-cycling analyses. In the CNTRL, the domain-average pressure bias was a result of 2–3-hPa (positive) pressure biases throughout the Columbia River basin, in the lee of the Cascade Mountains. The CNTRL forecasts, in this region, were characterized by anomalously low temperature and anomalously high pressure throughout the case.
Domain-averaged RMSE was computed each hour for several variables from the analysis error at all METAR locations in the model domain (see the appendix for details). Figure 6b displays the domain-averaged time series of altimeter RMSE for the CNTRL and four no-cycling DA experiments. Period-averaged differences in RMSE between the CNTRL and the four experiments are displayed in the right panel. Relative to the prior (CNTRL), assimilating corrected SPOs consistently reduced the altimeter analysis error at METAR locations by approximately 0.5 hPa (~50%). Assimilating uncorrected SPOs proved nonbeneficial to altimeter setting analyses as the time-averaged altimeter RMSE in the PHONE_NOQC experiment was not significantly different from CNTRL. The assimilation of mesonet altimeter setting resulted in a median altimeter RMSE reduction of 0.6 hPa. The largest reduction in altimeter RMSE was achieved when METAR altimeter setting observations were assimilated. This result is expected as the assimilated observations were not independent of the verification.
Figures 6c–e display time series of 2-m temperature, 2-m dewpoint, and 10-m wind speed analysis error, relative to the prior (CNTRL) error. In both the PHONE and PHONE_NOQC experiments, SPOs generally provided no added benefit to the CNTRL analyses of 10-m wind speed, while a slight period-average improvement in 10-m wind analyses was observed when mesonet/METAR altimeter setting observations were assimilated. Assimilating corrected SPOs reduced the dewpoint and temperature analysis errors approximately 0.1 and 0.18 K, respectively. RMSE improvements from assimilating the mesonet altimeter setting were comparable to those achieved by assimilating corrected SPOs. There were improvements to temperature and dewpoint analyses when uncorrected SPOs were assimilated, and, assimilating uncorrected SPOs reduced temperature analysis errors to a greater degree than assimilating corrected SPOs.
In the CNTRL, the domain-average temperature bias was negative due to persistent 2–3-K (negative) temperature biases in the Columbia River basin, east of the Cascade Mountains (not shown). SPO assimilation in the PHONE experiment, and, to a greater degree, in the PHONE_NOQC experiment produced negative pressure increments. In the CNTRL, ensemble correlations between pressure and temperature were mostly negative. Consequently, negative pressure increments were associated with positive increments to the temperature field. Analysis increments in the PHONE_NOQC experiment were larger than in the PHONE experiment, as uncorrected SPOs deviated more from the CNTRL analysis and were more numerous/widespread than corrected SPOs. Accordingly, positive temperature increments in the PHONE_NOQC experiment helped offset negative temperature biases in CNTRL, to a greater degree than in the PHONE experiment. As a result, the 2-m temperature analysis RMSE was smaller in the PHONE_NOQC experiment than in the PHONE experiment.
b. Correlation length scale
Altimeter assimilation produced RMSE improvements of different magnitudes for each observed surface variable (Fig. 6). It was initially hypothesized that assimilating pressure should improve wind analyses, since pressure and wind are intimately related; however, this was not the case in the no-cycling DA experiments. To explain this lack of improvement in the wind statistics, the magnitude of the correlation coefficients between surface pressure and itself, 2-m temperature, 2-m specific humidity, and the zonal component of the 10-m wind was computed as a function of distance for each grid point in the CNTRL ensemble (Fig. 7). Figure 7 reveals that surface pressure is correlated with itself at distances of up to 320 km. This distance, defined as the correlation length scale for pressure, is in good agreement with the effective localization radius for SPOs noted in section 2. The second and third most closely correlated variables with surface pressure were 2-m temperature and 2-m specific humidity, respectively. The smaller the correlation magnitude, the smaller the covariance, and the smaller the analysis increments. Analysis error reductions were greater for 2-m temperature than 2-m dewpoint due to temperature’s longer correlation length scale and larger correlation with surface pressure. Little to no improvement was observed for 10-m wind analyses in the no-cycling DA experiments since correlations between ensemble estimates of pressure and wind were minimal.
c. Sensitivity experiments
In previous research, a connection was found between the surface pressure observation density and analysis error (Anderson et al. 2005; Lei and Anderson 2014; Madaus et al. 2014). This relationship is tested here for corrected/uncorrected SPOs by assimilating varying sample sizes of SPOs over the duration of case I. At each assimilation step, specified numbers of SPOs were selected by random sampling without replacement, which ensured that SPOs from the same smartphone were not necessarily assimilated every hour.
Figure 8 displays the results of this sensitivity experiment. Assimilating corrected SPOs resulted in a monotonic decrease of the analysis altimeter, 2-m temperature, and 2-m dewpoint RMSEs relative to the prior (CNTRL) as the number of corrected SPOs assimilated was increased. A similar reduction in 2-m temperature and 2-m dewpoint RMSE was observed when the number of uncorrected SPOs assimilated was increased. Decreases in analysis RMSE of each variable in the PHONE experiment were consistent with the correlation length scale between the variable and surface pressure (Fig. 7). Surface variables more correlated with surface pressure and with longer correlation length scales exhibited larger reductions in analysis RMSE. In the PHONE experiment the largest reductions were observed for altimeter setting, followed by 2-m temperature and 2-m dewpoint. In both the PHONE and PHONE_NOQC experiments, wind analysis RMSE was independent of the number of observations assimilated since the sample covariance between the wind and pressure was, on average, minimal.
d. Cycling experiments
To evaluate the cumulative impact of SPO assimilation, four cycling DA experiments were performed with corrected SPOs (PHONE), uncorrected SPOs (PHONE_NOQC), mesonet altimeter observations (MESONET), and METAR altimeter observations. For all cycling experiments, 1-h forecast errors were computed by subtracting METAR observations from the prior ensemble-mean 1-h forecast at the location of each METAR observation. A domain-averaged 1-h forecast RMSE was computed at each assimilation step for the ensemble mean altimeter setting, 2-m temperature, 2-m dewpoint, and 10-m wind speed.
Time series of the domain-averaged 1-h altimeter forecast RMSE for each cycling experiment are displayed in Fig. 9a, with period-averaged differences in RMSE between the CNTRL 1-h forecast and the four DA experiments displayed in the right panel. Assimilating corrected SPOs consistently reduced 1-h forecast altimeter RMSEs, with a median RMSE reduction of 0.4 hPa. This reduction in RMSE was not significantly different than the reduction of 1-h forecast altimeter RMSEs observed in the MESONET and METAR experiments. Assimilating uncorrected SPOs provided no benefit to 1-h forecasts of altimeter setting.
Figures 9b–d display the domain-averaged 1-h forecast RMSEs for 2-m temperature, 2-m dewpoint, and 10-m wind, as well as the CNTRL ensemble-mean 1-h forecast RMSE. Assimilating corrected SPOs consistently improved the 1-h temperature forecasts and, to a lesser degree, the 1-h dewpoint forecasts. This result is expected since pressure is more strongly correlated with 2-m temperature than dewpoint. On average, corrected SPOs slightly degraded the performance of 1-h wind forecasts. When uncorrected SPOs were assimilated, reductions in 2-m temperature were observed but were not sustained. Large increases in 2-m temperature RMSE were observed in the PHONE_NOQC experiment toward the end of the period. In contrast to the no-cycling experiments, the assimilation of uncorrected SPOs degraded the 1-h forecasts of 2-m dewpoint and 10-m wind speed.
To examine the time-averaged spatial distribution of the forecast error, the 1-h forecast RMSE was computed at each METAR verification site over case I, for all cycling DA experiments and the CNTRL experiment. The results are displayed in Fig. 10, which shows the 1-h forecast RMSE difference between each assimilation experiment and the CNTRL experiment, at all verification sites, for surface variables altimeter setting, 2-m temperature, 2-m dewpoint, and the 10-m zonal u-wind component. Figure 10 reveals that reductions in the 1-h forecast RMSE for altimeter setting were widespread in all experiments except for the PHONE_NOQC experiment, in which 1-h forecasts of altimeter setting were degraded in western Washington and southeastern Oregon. Improvements to 1-h forecasts of 2-m temperature were observed throughout the domain in all cycling experiments; however, in the PHONE_NOQC experiment the 2-m temperature forecasts were only marginally improved in western Washington. In the PHONE_NOQC experiment the 1-h forecast RMSE for 2-m dewpoint was increased, relative to the CNTRL, over northwest Washington and Vancouver Island, Canada. In this region the assimilation of uncorrected SPOs produced large negative (positive) pressure (temperature) increments resulting in anomalously warm and dry conditions at the surface. In the METAR, MESONET, and PHONE experiments the 1-h dewpoint forecast RMSE was markedly reduced, relative to the CNTRL, throughout western Washington, where most assimilated observations were located. A slight increase in the u-wind 1-h forecast RMSE, relative to CNTRL, was observed across western Washington in all DA experiments. Since pressure and wind were poorly correlated in this ensemble, wind analysis increments were prone to spuriousness. This was particularly true in western Washington, where most observations were assimilated, and the analysis increments were largest.
e. Precipitation skill
During case I, the surface low passage was associated with both frontal and postfrontal precipitation. To evaluate the impacts of SPO assimilation on precipitation forecasts, fractions skill scores (FSSs) were computed for 1-h ensemble precipitation forecasts for ≥1 mm (see the appendix for details). Gridded observations from NCEP Stage IV 1-h precipitation accumulation analyses were used to compute the FSS at a variety of spatial scales (Fig. 11a). On average, the FSS remained below the “useful” skill threshold of 0.5 suggested by Roberts and Lean (2008). Nevertheless, assimilation of corrected SPOs and, to a lesser degree, uncorrected SPOs improved the time-averaged FSS relative to CNTRL (Fig. 11b). There were several times when the FSS of the 1-h precipitation forecasts in the PHONE and PHONE_NOQC exceeded the useful skill threshold when CNTRL did not (Fig. 11c). A notable example of this is at 1600 UTC 16 November when the FSS in the PHONE experiment peaked during a postfrontal period characterized by a decline in FSS in CNTRL. This peak in the time series of FSS for the PHONE experiment was observed for all neighborhoods (not shown).
Figure 12 compares the fractional coverage of gridded precipitation from the Stage IV precipitation analyses and the fractional coverage of the ensemble members that met/exceeded the forecast precipitation threshold of 1 mm within a 68-km neighborhood at 1600 UTC 16 November. Figure 12 reveals that CNTRL failed to capture postfrontal precipitation while the PHONE_NOQC experiment overforecast precipitation. This is not surprising as the assimilation of uncorrected SPOs introduced a systematic low pressure bias that promoted precipitation. In contrast, the assimilation of corrected SPOs in the PHONE experiment resulted in a more skillful mesoscale 1-h precipitation forecast. In the PHONE experiment, SPO assimilation reduced the pressure just offshore of the Oregon coast. This reduction in pressure, relative to CNTRL, encouraged the development of shallow convection along the Oregon coast that produced a more realistic distribution of precipitation in the PHONE experiment.
f. Free forecasts
To examine the impact of SPOs on forecasts at longer lead times, 11 free-forecast runs were performed during case I. The 0–6-h free forecasts were initialized with analyses from the cycled PHONE, PHONE_NOQC, and CNTRL ensemble every 6 h from 1200 UTC 14 November to 0000 UTC 17 November 2016. The RMSE of ensemble mean forecasts from all 11 runs was computed for mean sea level pressure, 2-m temperature, 2-m dewpoint, and 10-m wind speed as a function of forecast lead time (Fig. 13). The assimilation of corrected SPOs improved the 2-m temperature and 2-m dewpoint RMSEs at forecast lead times up to 6 h, while MSLP forecasts were improved at 3–5-h lead times. When uncorrected SPOs were assimilated, MSLP forecasts were degraded relative to CNTRL. The assimilation of uncorrected SPOs degraded the 2-m dewpoint forecasts at short lead times and reduced the 2-m temperature RMSE at all forecast lead times. In both the PHONE and PHONE_NOQC free-forecast experiments, significant improvements to 10-m wind speed forecasts were not observed.
6. Case II: Data assimilation and forecast results
The second case represents a very different synoptic/mesoscale evolution from case I, with an intense, compact midlatitude cyclone moving northward just offshore of the Pacific coast, with substantial errors in track and intensity in the operational forecasts.
a. Cycling experiments
Figure 14a displays the domain-average altimeter bias and RMSE for cycling experiments assimilating corrected (PHONE) and uncorrected (PHONE_NOQC) SPOs. Period-averaged differences in RMSE between CNTRL and the two DA experiments are displayed in the right panel. For the period average, assimilating uncorrected (corrected) SPOs degraded (improved) the 1-h forecasts of altimeter setting. The time plot of altimeter RMSE reveals that between 0000 and 0600 UTC 16 October the 1-h forecast altimeter RMSE was substantially reduced, relative to CNTRL, in both the PHONE and PHONE_NOQC experiments. During this period, the surface low approached and made landfall on Vancouver Island. Uncorrected SPO errors were less than those in CNTRL. As in case I, uncorrected SPOs contributed to a low bias in 1-h altimeter forecasts. Domain-averaged 1-h altimeter forecasts were low biased in both the CNTRL and PHONE_NOQC experiments during the period when the low made landfall. Assimilation of corrected SPOs slightly overcorrected the domain-average low pressure bias during this period.
Figure 14b displays the domain-average 10-m wind speed bias and RMSE for the CNTRL, PHONE, and PHONE_NOQC cycling experiments. In this case, wind forecast errors were dominated by errors in the track of the surface low. In CNTRL, 10-m wind speeds were overforecast during the time when the low made landfall. This positive bias was mostly corrected in the PHONE_NOQC and PHONE experiments. In the period average, assimilating SPOs provided no added benefit to domain-averaged wind forecasts.
In this case, errors in the forecast track contributed to poor wind forecasts. Figures 15a and 15b display the forecast intensity and track of the surface low in analyses from the CNTRL, PHONE, and PHONE_NOQC cycling experiments. The analyzed intensity and track from the NOAA HRRR system is also plotted as an estimate of truth. Later in the period, when the surface low entered the MADIS maritime (buoy) and METAR observing networks, the minimum observed MSLP was plotted. Since the surface low did not pass directly over observing sites, this estimate can be considered a lower limit on the storm intensity.
Early in the period, prior to 2200 UTC, SPO assimilation had little impact on the analyzed track and intensity of the surface low as the surface low remained far offshore. At 2200 UTC, the analyzed forecast track in the PHONE_NOQC experiment shifted to the northwest of the CNTRL track, in better agreement with the HRRR analysis. A similar northwestward shift in the analyzed surface low position was observed an hour later (2300 UTC) in the PHONE experiment. At this time the magnitude of the analysis increments near the surface low increased substantially (not shown), since prior to this time the distance between the surface low and Seattle, where the majority of the SPOs were located, was less than the effective localization radius for SPOs.
In the PHONE_NOQC experiment the northwestward shift in the forecast track was observed an hour earlier because more SPOs were assimilated in this experiment than in the PHONE experiment. In this special case, model errors exceeded the average magnitude of the uncorrected SPO error. For this reason, the quality of observations was of less importance than the quantity, especially along the sparsely observed coastline. The cumulative impact of assimilating coastal SPOs at locations unobserved in the PHONE experiment facilitated an earlier shift in the storm track in PHONE_NOQC experiments by extending the analysis increments farther offshore than in the PHONE experiment. While the timing of the track shift differed in each SPO experiment, the location of landfall was the same in both experiments and in better agreement with the HRRR analysis than CNTRL. Likewise, surface low intensity analyses in the PHONE and PHONE_NOQC experiments were closer to the HRRR analysis and minimum observed MSLP than CNTRL. While not shown in Fig. 15, similar improvements to analyses were retained in 1-h cycled forecasts of the surface low intensity and position.
b. Free forecasts
To evaluate how SPO assimilation impacted forecasts of the surface low track and intensity at longer lead times, free forecasts were initialized at 2300 UTC 15 October from the cycled CNTRL, PHONE_NOQC, and PHONE ensembles. This initialization time was chosen because at this time the low positions in both SPO experiments had deviated from the position of the low in CNTRL. In case I, free-forecast experiments showed MSLP forecast improvements at lead times up to 5 h. For this reason, 0–5 h free forecasts were evaluated in case II.
Figures 16a and 16b display 0–5-h forecasts, initialized at 2300 UTC, of the surface low intensity and position for the HRRR, CNTRL, PHONE_NOQC, and PHONE experiments. In the PHONE and PHONE_NOQC experiments, the surface low intensity was closer to the HRRR analysis and minimum observed MSLP than in CNTRL. In both SPO experiments, initial improvements to the surface low intensity were retained at all forecast lead times. Similarly, improvements in the track of the surface low were observed at all forecast lead times in the PHONE and PHONE_NOQC experiments (Fig. 16b). At forecast lead times of 2–5 h, the surface low track in the PHONE_NOQC experiment overlapped with the surface HRRR analyzed track. In the PHONE experiment, the surface low tracked parallel to the HRRR analysis as the low approached land, making landfall approximately 25 km east of the analyzed HRRR track. In CNTRL the surface low tracked approximately 100 km east of the HRRR-analyzed track, making landfall on the Olympic Peninsula before crossing the Strait of Juan de Fuca.
c. Wind forecast analysis
In case II, the intensity and position of the surface low impacted the distribution and strength of near-surface winds along the western Washington coast and the interior. To evaluate the performance of ensemble near-surface wind and wind gust forecasts, the Brier skill score (BSS) was employed (see the appendix for details). Using the CNTRL ensemble as a reference forecast, the BSS was calculated for SPO cycling and free-forecast experiments. MADIS maritime and METAR near-surface wind observations were used for verification. Time plots of BSS for probabilistic forecasts of 10-m wind speed exceeding 10 m s−1 and surface wind gusts exceeding gale force (17.2 m s−1) are shown in Fig. 17. The assimilation of uncorrected/corrected SPOs resulted in more skillful 10-m wind and surface gust forecasts from 0100 to 0400 UTC 16 October 2016. By this time, the surface low intensity had decreased, and the low position had shifted northwest relative to CNTRL. Free forecasts initialized with analyses from the PHONE and PHONE_NOQC cycling experiments produced more skillful 10-m wind speed and surface gust forecasts than CNTRL at 2–5-h forecast lead times. Improvements in near-surface wind forecast skill were greatest in the PHONE_NOQC free-forecast experiment, as in this experiment the surface low track was farthest from the CNTRL track and closest to the analyzed HRRR track. In both SPO forecast experiments, improvements to the initial surface low intensity and position were retained at forecast lead times up to 5 h, facilitating more skillful wind forecasts at equivalent forecast lead times.
This paper examines the impact of smartphone pressure observations (SPOs) for two events: the first involving the passage of a trough and associated cold front and the second associated with the landfall of an intense low pressure center. For each case, there is an evaluation of the impacts of advanced quality control strategies and machine learning for the bias correction of smartphone pressure observations. In addition, the impact of improved quality control/bias correction of smartphone pressure observations on forecast skill is examined, building on previous work (McNicholas and Mass 2018).
In case I, a surface low/trough traversed the Puget Sound region, where SPO density was greatest. During this event, corrected SPOs consistently reduced the analysis error of the altimeter setting, 2-m temperature, and 2-m dewpoint. Reductions in 1-h forecast errors for these surface variables were achieved when corrected SPOs were assimilated in the cycling mode. Such reductions in RMSE were consistent in time and space. Compared to experiments that assimilated pressures from traditional mesonets, the assimilation of corrected SPOs resulted in nearly equivalent reductions in domain-averaged altimeter and temperature forecast/analysis errors. Likewise, the spatial distribution of forecast improvements was markedly similar in experiments assimilating corrected SPOs and pressures from traditional mesonets and METARs.
The assimilation of uncorrected SPOs did not improve the altimeter analyses and 1-h forecasts in case I; however, uncorrected SPOs were able to improve analysis/forecasts of 2-m temperature. This unintuitive result was the consequence of a cancellation of biases, wherein negatively biased smartphone pressures induced positive temperature increments that reversed a systematic negative temperature bias in the control experiment. In no-cycling/cycling DA experiments, SPOs did not improve wind analyses/forecasts, a result reflecting the lack of correlation between ensemble estimates of pressure and wind. The magnitude of the analysis and forecast error reductions, achieved by assimilating corrected SPOs, was directly proportional to the number of observations assimilated and the magnitude of the correlation of surface pressure with the surface variable evaluated.
In case I, both corrected and, to a lesser degree, uncorrected SPOs improved 1-h forecast precipitation skill relative to the control simulation without smartphone observations. Improvements in the fractions skill score were most notable during the postfrontal period when the assimilation of corrected SPOs improved mesoscale forecasts of postfrontal convective precipitation along the Oregon coast. Free-forecast experiments showed that assimilating corrected SPOs resulted in a significant reduction in forecast RMSEs for altimeter setting, 2-m temperature, and 2-m dewpoint at forecast lead times of 3–6 h.
Case II considered a storm poorly forecast by operational systems. The assimilation of both corrected and uncorrected SPOs significantly improved altimeter and 10-m wind forecasts during the period of storm landfall. In this case, SPO quality had little impact on forecast performance since errors in uncorrected SPOs were dwarfed by the magnitude of the pressure errors in the control ensemble. In cycled SPO assimilation experiments, errors in the analyzed track and intensity of the windstorm were markedly reduced as the storm approached landfall. Free-forecast experiments demonstrated that such reductions in model analysis errors were associated with improvements in the forecast track and intensity of the windstorm at short lead times. In both cycling and free-forecast SPO experiments, improvements in the forecast storm track resulted in commensurate improvements to probabilistic near-surface wind forecasts.
In the region used in these experiments (the Pacific Northwest), there are likely over a million smartphones capable of retrieving pressure. This would imply that in the experiments discussed above less than 0.1% of potential SPOs were assimilated. In this study, sensitivity experiments revealed that domain-averaged analysis error, relative to the control, decreased monotonically as the number of assimilated smartphone observations was increased. Since just over a thousand hourly SPOs performed similarly to existing mesoscale pressure networks in constraining forecasts of pressure, temperature, and dewpoint, it is plausible that greater reductions in analysis/forecast error are possible if a considerably denser network was available. MM2018 showed that such a network is feasible by demonstrating that smartphone pressures can be efficiently collected and bias corrected at subhourly intervals. This study confirms the methodology of MM2018 and suggests that crowdsourced smartphone pressures can enhance operational numerical weather prediction.
The authors would like to acknowledge uWx users, whose cooperation made this research possible, the Weather Company (IBM) for their generous financial support of this research, and Microsoft for providing cloud-computing time and resources. Support was also provided by a grant from the NOAA CSTAR program through Grant NA10OAR4320148AM63.
In this study, ensemble forecasts were evaluated using the National Center for Atmospheric Research (NCAR) Model Evaluation Toolkit (MET; Fowler et al. 2018). MET was used to calculate several verification metrics for ensemble mean forecasts. The first metric, mean error (bias), was computed as the domain-average difference between the ensemble mean forecast and verifying observation at each observation location:
The second metric, RMSE, was computed as an average over the model domain:
In all DA experiments, RMSE and bias was evaluated with high quality METAR observations.
Ensemble probabilistic forecast skill was evaluated using the Brier score (BS; Brier 1950). The BS is analogous to the mean squared error for probabilistic forecasts:
In the Brier score, represents the fraction of ensemble members that forecast an event to occur at time , while defines whether an event was observed to occur at time . In this study, the BS is used to calculate the Brier skill score as
and the CNTRL ensemble is used as the reference forecast. When the BSS is negative (positive), the ensemble FCST is less (more) skillful than CNTRL.
To evaluate the performance of ensemble forecasts across spatial scales, the fractions Brier score (FBS; Roberts 2005) is used. The FBS is defined as
where N is the number of neighborhoods. Neighborhoods N are defined using a radius of influence r. At each grid point , a neighborhood is defined as a square grid of all grid points within r kilometers of . In Eq. (A5), represents the fraction of grid points (i.e., fractional coverage) of a binary metric (e.g., precipitation accumulation ≥ 1 mm) within a forecast neighborhood, at each grid point . Likewise, is the fractional coverage of a binary metric within an observed neighborhood, at each grid point . In this study, the FBS is used within the context of the fractions skill score (FSS; Roberts and Lean 2008). The FSS is calculated as
where the denominator represents the worst possible FBS (i.e., observed and forecast events have no spatial overlap). For the purposes of this study, the FSS is evaluated using the neighborhood ensemble probability approach outlined in Schwartz et al. (2010). In this approach, represents the fraction of ensemble members that exceed a given threshold within neighborhood N, at each grid point .