## Abstract

This study addresses the issue of improving nowcasting accuracy by integrating several numerical weather prediction (NWP) model forecasts with observation data. To derive the best algorithms for generating integrated forecasts, different integration methods were applied starting with integrating the NWP models using equal weighting. Various refinements are then successively applied including dynamic weighting, variational bias correction, adjusted dynamic weighting, and constraints using current observation data. Three NWP models—the Canadian Global Environmental Multiscale (GEM) regional model, the GEM Limited Area Model (LAM), and the American Rapid Update Cycle (RUC) model—are used to generate the integrated forecasts. Verification is performed at two Canadian airport locations [Toronto International Airport (CYYZ), in Ontario, and Vancouver International Airport (CYVR), in British Columbia] over the winter and summer seasons. The results from the verification for four weather variables (temperature, relative humidity, and wind speed and gust) clearly show that the integrated models with new refinements almost always perform better than each of the NWP models individually and collectively. When the integrated model with innovative dynamic weighting and variational bias correction is further updated with the most current observation data, its performance is the best among all models, for all the selected variables regardless of location and season. The results of this study justify the use of integrated NWP forecasts for nowcasting provided they are properly integrated using appropriate and specifically designed rules and algorithms.

## 1. Introduction

Nowcasting is short-period weather forecasting concerned with current weather conditions and the changes over the next few tens of minutes to the ensuing 6 h. In contrast to the familiar synoptic weather forecast (beyond 6 h), nowcasting is also highly location specific and requires data of very high spatial and temporal resolution to produce accurate forecasts. The primary aim of nowcasting is to predict significant weather events with high specificity as to their onset, duration, intensity, severity, and location, which are especially important for many organizations and situations.

Most “traditional” nowcast systems focus on precipitation. There is a view that every weather variable should be nowcastable with high accuracy. However, the importance and utility of various weather variables may be different for different clients or activities. The four variables (temperature, relative humidity, and wind speed and gust) used in this study were selected for their importance within the nowcasting context although they are rarely addressed in traditional nowcasting. Wind speed and gust, for instance, are of vital importance to aviation while temperature and relative humidity are important for making decisions as to when to salt and sand for winter road maintenance. The air navigation authority in Canada, NAV CANADA, indicates that it is very important to predict crosswinds, which can close down certain runways (Isaac et al. 2012b, hereafter IMA). The accuracy of the forecasts from these variables also greatly affects the accuracy of visibility and fog parameterizations that are critically important to aviation.

Traditionally, nowcasting methods are largely based on the analysis and time extrapolation of single-variable trends (mostly precipitation) using data collected from a wide range of sources including weather stations, radar mosaics, and satellite images (Austin and Bellon 1974; Garand 1993; Joe et al. 2004; Poli et al. 2008). Analyses are usually straightforward with the assumption that the rate of change is constant or involving various empirical and rule-based constructs. Most of the present methods are heavily dependent on dense, local observation networks delivering frequently updated observation data for their accuracy. If the high data quality requirement is satisfied, these methods can generally produce accurate forecasts in the 0–3-h period but rapidly lose accuracy thereafter (Austin and Bellon 1974; Browning and Collier 1989; Golding 1998; Lin et al. 2005).

To extend the period of nowcast predictability, various numerical weather prediction (NWP) models that were originally developed specifically to produce short- and medium-range (synoptic) forecasts are now also being used to do nowcasting. But the detail of these synoptic forecasts is far below the standards required for use in nowcasting because of spinup effects, imperfect assimilation algorithms, time delays with assimilation analysis, and models having too coarse a spatial and temporal resolution (Bock and Nuret 2009; Dee 2005; Polavarapu et al. 2005). NWP models at all scales have comparatively low skill within the nowcasting range of 0–6 h, particularly when applying them to areas of complex terrain (Haiden et al. 2011; Huang et al. 2012a,b, manuscripts submitted to *Pure Appl. Geophys.*, hereafter HISa,b; Isaac et al. 2012a, hereafter IPAG). Thus, there is a large gap between the needs of nowcasting and what NWP models can provide in terms of the accuracy of very short-term predictions and the required spatial and temporal specificity.

With the recent development of integrated environmental monitoring, advanced integrated nowcasting techniques are taking advantage of the rich data outputs from such monitoring networks, as well as models, and attempting to improve the prediction period (out to 6 h) (Bowler et al. 2006; Golding 1998; Haiden et al. 2011; Huang 2011; IMA; Mueller et al. 2003; Rasmussen et al. 2001). For example, Nimrod is a system for generating automated, very short-range forecasts for precipitation, cloud, and visibility by using radar and satellite data together with surface reports and NWP fields (Golding 1998). Integrated Nowcasting through Comprehensive Analysis (INCA) is an integrated nowcasting system that provides nowcasts of temperature, humidity, wind, precipitation amount, precipitation type, cloudiness, and global radiation (Haiden et al. 2011). However, most integrated techniques focus on integrated nowcasting with forecasts from a single NWP model.

Recently, NWP models, model output statistics (MOS), updateable model output statistics (UMOS), and ensembles with different spatial and temporal resolutions have become available for many areas. When instrumentation and technology are sufficient to satisfy the high-density, high spatial and temporal resolution data requirements for nowcasting, and if traditional NWP model forecasts and modeling techniques for nowcasting forecasts can be properly adapted for nowcasting, then the integration of the two should lead to the improvement in accuracy that is required. However, at present, how can we, “properly adapt” NWP model forecasts for nowcasting?

In a milestone paper, Clemen (1989) surveyed over 200 studies from diverse fields, such as forecasting, psychology, statistics, and management success. His main conclusions are that combining multiple forecasts increased forecast accuracy, and that this trend is consistent whether the forecast are judgmental, statistical, econometric, or based on extrapolation. A rather surprising finding is that in many cases, dramatic improvements in accuracy can be achieved by simply averaging the forecasts and this often rivals the results from more sophisticated statistical approaches (Armstrong 1989; Clemen 1989).

This principle was applied to find the optimal algorithms for providing improved nowcasts through integrating observations and forecasts from among potential different models. Five different integrated methods were designed and verified at two Canadian airports, Toronto Pearson International Airport (CYYZ), in Ontario, and Vancouver International Airport (CYVR), British Columbia, during winter and summer periods.

In section 2, the equations and algorithms that define each of the integration methods are described. Starting with integrating NWP models using equal weighting, each refinement successively applies the following: dynamic weighting, bias correction, adjusted dynamic weighting, and constraints using current observation data. Section 3 describes the verification of each method with the successive refinements mentioned above. Discussion and conclusions are given in sections 4 and 5 respectively.

## 2. Methodology

It is well known that nowcasting requires purposefully built models with higher spatial resolution, improved location specificity, increased temporal resolution of observation data, and greater frequency of model updating. Since there is a general lack of numerical models specifically designed and developed to do nowcasting, nowcasting is often based on one or several available NWP models regardless of the spatial resolution for a particular location. When there are different NWP models for a specific location, how does one choose the best NWP model for nowcasting? Is it possible to integrate NWP forecasts to generate optimal nowcasts? If yes, how would it be done? In this study, the different integration methods and how the refinements are successively applied to integrate the NWP models and observation data are described.

### a. Equal weighting integration (INT_EW)

It is assumed that there are *n* NWP models that can be used for generating integrated forecasts. In this example, *F _{i,j,k}* is the forecast from an NWP model. For the description here and below,

*i*denotes the forecast lead time,

*j*for an NWP model, and

*k*for a forecast variable. An equal weight (ew

_{j,k}) for a forecast variable

*k*from an NWP model

*j*is defined by

The equally weighted integrated forecast [INT_EW(*i,k*)] is then defined as

In general, simple averaging of two (or three) independent, completely uncorrelated, estimates with equal error variance will reduce the root-mean-square error (RMSE) by a factor of sqrt(2) [or sqrt(3)]. If there is a correlation between the errors of different forecast models, the reduction will depend on individual forecast errors.

Averaging forecasts can reduce the spread of their predictions as well as some random error. For extreme events, averaging may reduce the likelihood of predicting such events.

This is why the various refinements and rules described below (i.e., dynamic weighting, bias correction, and observation data constraint) are used to try to find the best integration process for maximizing the likelihood that extreme events can still be predicted.

### b. Dynamic weighting integration (INT_DW)

Since NWP model performance varies by time, location, and variables (Huang 2011; IPAG; Mass et al. 2008), appropriate weighting is essential (Lange et al. 2006; Dickinson 1975; Huang 2011; Nielsen et al. 2007). The dynamic weighting scheme may be needed for each variable at each location for a specific model run time. The weights for the variables of each NWP model can be derived according to how well each model performed most recently (within the past several hours or days). That is, the closer an NWP model forecast is to the observation data, the larger the weight is assigned to the model producing the forecast. Specifically, in this study, the mean absolute error (MAE) for the past 6 h is used to derive weights for each model.

There are several reasons to use the most recent model performance rather than the long-period average of performance for deriving weights: 1) model performance most relevant to the nowcasting period is heavily influenced by the most recent physical situation, 2) information over a long period may introduce unnecessary information for nowcasting, 3) results are not affected by frequent NWP model modification, 4) the method is relatively easy to implement, and 5) extensive data archiving is not needed.

The weight for each variable of the model can be derived by

where dw_{j,k} is the weight for each variable *k* of the model *j* and it is inversely proportional to the performance measure *P _{j,k}*. The dw

_{j,k}is calculated at each run time to generate the integrated model.

In this study, *P _{j,k}* is calculated and it is derived by the mean absolute error of variable

*k*from NWP model

*j*during the past 6 h:

where *F _{i,j,k}* is the forecast of variable k at time

*i*from an NWP model

*j*and

*O*is the observation value of variable

_{i,k}*k*at time

*i*. In addition,

*N*is the sample size of paired forecasts and observations for the comparison during the past 6 h.

The integrated forecasts [INT_DW(*i,k*)] can then be generated using Eq. (2).

### c. Equal weighting plus bias correction (INT_EWB)

NWP models often contain systematic errors due to the initialization and parameterization of physical phenomena and large-scale pattern effects (Bock and Nuret 2009; Dee 2005; Mass et al. 2008). In general, the biases between NWP models and observation are not fixed offsets. Rather, they can vary with time, with geographical location, and with forecast variables (Huang 2011; IPAG; Mass et al. 2008). In this study, the criteria for bias correction are defined as follows: if all NWP models produce forecasts that are consistently over, or under, the observation values for the past 6 h, model bias is deemed to exist, and a bias correction is needed. The bias-corrected forecast [INT_EWB(*i,k*)] is defined as

where *b _{i,j,k}* is the bias correction factor for variable

*k*of model

*j*at forecast lead time

*i*and

*B*is the bias term. Considering that the future performance of each model may not be highly related to the model’s past performance, and positive and/or negative biases from different models may exist for the next round of the forecast period,

_{k}*B*is empirically defined as the minimal difference between the most recent observation and the NWP forecasts at the same time.

_{k}The value of *b _{i,j,k}* varies from 0 to 1 and depends on the difference between the forecast value and the current observation value, as well as the forecast lead time.

The process of detecting for bias is dynamic and is done at each 10-min time step in this study.

### d. Adjusted dynamic weighting plus bias correction (INT_AWB)

Although the weight assigned to each NWP model is inversely proportional to its MAE (method b), if the MAE is too large, then it would be better to eliminate entirely that variable’s contribution to any further forecasts. Thus, if the MAE for any particular variable exceeds a certain threshold, then the weight for that model is assigned a value of 0. The threshold for elimination of a model from the weighting scheme depends on both the model weight and on the number of selected NWP models. For example, if the weight of a model is less than, or equal to, 0.1 or 0.2, the model will be eliminated when three or two NWP models are used, respectively. This reassigned weight is based on the performance of the other models and is termed the adjusted weight ().

The integrated forecast from adjusted dynamic weighting plus bias correction [INT_AWB(*i,k*)] can be described by

### e. Adapting nowcasting and NWP techniques (INT_OWB)

Persistence, or pure time extrapolation, can produce fairly accurate forecasts in the 0–3-h time frame. Thus, the forecasts produced from Eq. (6) above may be further improved by using the current observation value as a constraint.

It is assumed that *d _{i,k}* is the amount of deviation between the integrated forecast [INT_AWB(

*i,k*)] and the most recent observation (

*O*) and is calculated using

_{k}The constraint amount (*C _{i,k}*) is defined by

where *a _{i,k}* is an extrapolation factor and is empirically defined based on certain rules and with values ranging from 0 to 0.2.

If the integrated forecasts for the first couple of hours are in the range of (*O _{k}*–

*C*) and (

_{i,k}*O*+

_{k}*C*) or

_{i,k}*d*<

_{i,k}*C*, no adjustments are needed. Otherwise, the integrated forecast [INT_OWB(

_{i,k}*i,k*)] will be defined by

The length of forecast lead time for the integrated forecasts to be adjusted is dependent on the particular forecast variables being used. For example, the adjustment for forecasts of wind is done only for the first half-hour. For temperature and relative humidity, the adjustment period is empirically defined as 2 h in this study.

## 3. Verification

In this study, three NWP models are used to generate integrated nowcasts of four variables (temperature, relative humidity, and wind speed and gust) using the five different methodologies described above at two locations during winter and summer periods.

### a. Research datasets

Two Canadian airports (CYYZ and CYVR) used for the CAN-Now (Canadian Airport Nowcasting) (IMA) project are used for testing the five different methods.

The observations were collected at 1-min intervals using an M300 data acquisition system (DAS). M300 is a computer name and it is also used to represent 1-min measurement data in CAN-Now. Temperature and relative humidity (RH) were measured by a Vaisala HMP45C212, which is an air temperature and RH sensor made by Vaisala. Wind speed and gust in 2D were measured by a Vaisala WS425 Ultra Sonic Wind Sensor at CYYZ and an AES 78D Cup Anemometer at CYVR (IMA).

The three NWP models used for generating the integrated models are the Canadian Global Environmental Multiscale (GEM) regional (REG) model, the GEM Limited Area Model (LAM), and the American Rapid Update Cycle (RUC) model.

The GEM REG data were obtained at a frequency of 7.5 min with a horizontal resolution of 15 km and provided forecasts up to 48 h. The REG data were processed four times daily at 0000, 0600, 1200, and 1800 UTC (Côté 1998a,b; Mailhot et al. 2006). In this study, the first 3 h of forecast data were removed from the analysis because of spinup issues. Since the model ran 4 times daily, the REG forecast data actually used for the verification were the model outputs from 3 to 9 h.

The GEM LAM data were obtained at a frequency of 5 min with a horizontal resolution of 2.5 km and provided forecasts up to 24 h. The data were processed once daily at 1200 UTC for CYYZ. GEM LAM 2.5 km ran twice per day at 0900 and 2100 UTC for CYVR (Fillion et al. 2010). No spinup time was removed, mainly because there was only one run per day at CYYZ and the run only went to 24 h. The forecast data actually used from LAM for the verification are the model outputs from 0 to 24 h for CYYZ and 0 to 12 h for CYVR. Both REG and LAM data are from the Canadian Meteorological Center (CMC) of Environment Canada.

The RUC model data were obtained at a frequency of 1 h with a horizontal resolution of 13 km and provided forecasts up to 12 h. The data were processed every hour. The RUC model is described in Benjamin et al. (2004). The RUC data come from the National Oceanic and Atmospheric Administration/National Centers for Environmental Prediction (NOAA/NCEP).

For all the NWP models, the forecasts from the nearest-neighbor grid point to the study locations are selected. The locations of M300 instruments and airports, and three NWP grid points, are indicated on the map in Fig. 1.

The verification dataset uses data from the most current model run. All data are reduced or interpolated into 10-min intervals (IMA). Specifically,

for all REG variables, the last instantaneous value was used;

for LAM and instrument data

*,*the average during the last 10 min for temperature, relative humidity, wind speed, and wind gust was used; andfor RUC data, linear interpolation to 10 min was performed. Only the RUC 6-h forecasts updated every hour were compared in the verification.

There are two wind gust schemes used for the NWP models. GEM REG and LAM use a modified version of the Brasseur (2001) scheme as modified by F. Boudala (2010, personal communication), and the RUC model has its own operational version.

### b. Nowcasting verification

The verification is performed for two periods. The winter period is from 1 December 2009 to 31 March 2010. The summer period is from 1 June to 31 August 2010. The MAE, bias or mean error (ME), and percentage of improvement (POI) are calculated for each variable. The MAE, ME, and POI are calculated using

where *F _{i}* is the

*i*th forecast from a model and

*O*is the

_{i}*i*th observation. In addition,

*N*is the total number of data pairs and

*V*

_{1}is the average MAE from three NWP models while

*V*

_{2}is the MAE from any integrated model.

It should be noted that, in the verification process, the metrics obtained from each model are relevant to the accuracy of forecasts when each model (NWP and integrated models) was used for nowcasting for up to 6-h lead time for the specific locations at CYYZ and CYVR where observations were measured. The metrics for the NWP models were not calculated based on the NWP model’s native forecast lead time (e.g., REG, 48 h; LAM, 24 h; and RUC, 12 h) for the area that each NWP model covered.

#### 1) Temperature

Table 1 lists the mean absolute and mean errors of temperature from each model for the winter and summer periods at CYYZ and CYVR. Comparing NWP models, the GEM REG and RUC have the smallest MAEs of 1.7° and 1.1°C in the winter and summer periods, respectively, for CYYZ. GEM LAM and REG have the smallest MAEs of 1.1° and 1.3°C in the winter and summer periods, respectively, for CYVR. Comparing NWP and integrated models, the smallest MAEs for both locations and both seasons come from combining observation and NWP models (INT_OWB, method 5).

Figures 2 and 3 show the mean absolute errors of temperature by forecast lead time (6 h) for the winter and summer periods at CYYZ and CYVR, respectively. Because of the averaging, where model forecasts at each hour are used more than once for different lead times, the NWP lines tend to be straight in these diagrams. Comparing the five integrated forecasts, equal weighting (INT_EW) has the largest MAE and the mean absolute errors are almost the same up to a 6-h lead time. Dynamic weighting (INT_DW) has smaller MAEs than equal weighting (INT_EW). With bias correction, the mean absolute errors are reduced in the three methods (INT_EWB, INT_DWB, and INT_OWB). The combination of observation and bias-corrected NWP models with adjusted weighting (INT_OWB) performs the best up to the 6-h lead time at both locations and during both seasons. The performances from other integrated models are somewhat dependent on location and season. For example, dynamic weighting (INT_DW) performed much better than equal weighting plus bias correction (INT_EWB) during the summertime at CYYZ and CYVR while the equal weighting plus bias correction (INT_EWB) performed better than dynamic weighting (INT_DW) in winter at both sites. Persistence (OBSP) performed quite well from the first 0.5 to 2 h.

#### 2) Relative humidity

Table 2 lists mean absolute and mean errors of relative humidity from each model for winter and summer periods at CYYZ and CYVR. Comparing NWP models, GEM LAM has the smallest MAEs in the winter and summer periods for both sites at CYYZ and CYVR. Comparing NWP and integrated models, the integrated forecasts from combining observations and NWP models (method 5) have the smallest MAEs for both locations and both seasons.

Figures 4 and 5 show the mean absolute errors of relative humidity for the winter and summer periods for CYYZ and CYVR respectively. Comparing the five integrated forecasts, equal weighting (INT_EW) has the largest MAE and the mean absolute errors are almost the same up to 6-h lead time. Dynamic weighting (INT_DW) has smaller MAEs than equal weighting (INT_EW). However, both of them have larger MAEs than any NWP models in winter for CYYZ. This may be caused by larger biases of relative humidity from all NWP models and the errors changed very irregularly during that time. Persistence is even better than any NWP models up to a 6-h lead time at CYYZ during winter.

Another reason for the differences is that the sample sizes used to calculate MAE from NWP models and integrated models were different because the data server might not have received NWP forecasts or observation data during some periods. The sample sizes were 16 222, 16 128, and 16 182 for REG, LAM, and RUC, respectively. All integrated models had the same sample size of 17 389 because integrated forecasts could still be generated as long as forecasts from any NWP model were available.

However, when applying the equal weighting or dynamic weighting methods, substituting persistence data of NWP forecasts for missing data would not assure the generation of good integrated forecasts. With bias correction, the mean absolute errors were reduced in the three methods (INT_EWB, INT_DWB, and INT_OWB). Of particular note, the MAE reductions are very distinct during winter at CYYZ and CYVR. The combination of observations and bias-corrected NWP models with adjusted weighting (INT_OWB) performed the best up to a 6-h lead time at both locations and for both seasons. In decreasing order of MAE (i.e., increasing accuracy) the models are INT_EW, INT_DW, INT_EWB, INT_AWB, and INT_OWB.

#### 3) Wind speed

Table 3 lists mean absolute and mean errors of wind speed from each model for the winter and summer periods at CYYZ and CYVR. Comparing NWP models, GEM LAM has the smallest MAE of 1.2 m s^{−1} in winter and RUC has the smallest MAE of 1.2 m s^{−1 }in summer for CYYZ. GEM LAM has the smallest MAEs of 1.4 and 1.6 m s^{−1} in the winter and summer periods, respectively, for CYVR. Comparing NWP and integrated models, the integrated forecasts from combining observations and NWP models (method 5) have the smallest MAEs for both locations and both seasons.

Figures 6 and 7 show the mean absolute errors of wind speed for the winter and summer periods for CYYZ and CYVR, respectively. Comparing the five integrated forecasts, all five integrated models perform better than any of the NWP models up to a 6-h lead time. There were no big differences in MAE among the five models after around a 2-h forecast lead time.

#### 4) Wind gust

Table 4 lists mean absolute and mean errors of wind gust (maximum wind speed) from each model for the winter and summer periods at CYYZ and CYVR. Comparing NWP models, RUC has the smallest MAEs of 1.7 and 1.5 m s^{−1} in both winter and summer, respectively, for CYYZ. GEM LAM and REG have the smallest MAEs of 1.9 and 2.2 m s^{−1} in the winter and summer periods, respectively, for CYVR. Comparing NWP and integrated models, the integrated forecasts from combining observations and NWP models (method 5) have the smallest MAEs for both locations and both seasons.

Figures 8 and 9 show mean absolute errors of wind gust for the winter and summer periods for CYYZ and CYVR, respectively. Comparing the five integrated forecasts, the three integrated models with bias correction can beat any of the NWP models up to a 6-h forecast lead time during winter at CYYZ and during winter and summer at CYVR.

Because of paper length limitations, only verification from temperature, relative humidity, wind speed, and gust are presented in this paper. The most refined algorithms described in section 2e have been used to provide optimal integrated forecasts of temperature, relative humidity, wind speed, wind gust, crosswind, visibility, ceiling, and precipitation rate out to approximately 6 h for different projects, such as CAN-Now (IMA) and the Science of Nowcasting Winter Weather for Vancouver 2010 (SNOW-V10) (IPAG). Verifications using the most refined method (INT_OWB) for both continuous and categorical variables from different projects are given in several other publications (Huang 2011; HISa,b; IMA). These papers also demonstrate the usefulness of the nowcast methods and, specifically, INT_OWB.

### c. Case study

Integrated forecasts (INTW) (from method 5, INT_OWB) have been used for various purposes. Figures 10–13 are example time series graphs of temperature, relative humidity, wind speed, and gust that can be used to explain how the techniques described in section 2e worked in a real-time situation. The graphs were generated and saved in real time at 1305 UTC 8 September 2011. In the graphs, REG is in blue, red for LAM, orange for RUC, green for INTW, and M300 (observation) is in cyan. The red vertical line indicates the time when the graph was generated and divided the graph into two parts: past 6 h to the left and forecasted 6 h to the right. Two runs of INTW forecasts (in green dotted line) are given from two runs at 0705 and 1305 UTC 8 September 2011.

Comparing time series graphs from these four variables, it can be seen that the same NWP model performed differently for different variables at the same time. For example, among the three NWP models, RUC performed best for relative humidity (see Fig. 11) and gust (see Fig. 13), while REG forecasts were closer to the observations for temperature (see Fig. 10), and LAM performed better for forecasting peak wind speed (see Fig. 12). Obviously, the equal-weighting method should not be used to generate integrated forecasts during this period. The INTW forecasts from method 5 gave the best forecasts for all four variables and the four graphs show that the INTW forecasts (from the run at 0705 UTC 8 September 2011) were closer to the observations (M300) than the forecasts from the NWP models on the left side of the red vertical line (during 0705–1305 UTC 8 September 2011).

When the NWP models show overforecast or underforecast biases, INTW can provide better forecasts for the variables. Figures 11 and 13 show that relative humidity and gust were overpredicted and Fig. 10 shows that temperature was underforecasted by the NWP models during 0700–1300 UTC 8 September 2011. Because of dynamic weighting and bias correction, INTW forecasts were closer to the M300 data (observation) and performed better than the NWP models.

Since INTW can immediately make use of the most recent observation data for guiding integrated forecasts, there is high confidence that INTW provides better forecasts for the first couple of hours. This is evident from the four time series graphs.

## 4. Discussion

The results from the verification of the four weather variables (temperature, relative humidity, and wind speed and gust) at two locations (CYYZ and CYVR) and over the winter and summer seasons show some surprisingly strong and consistent trends. First, with a few exceptions, the three integrated models (methods 3–5) clearly outperformed each of the NWP models (LAM, REG, and RUC) individually and as a whole (average MAE of the three models, NWP_{MAE}). This is evident from Tables 1–4 comparing the MAEs of the three NWP models and the five integrated models for each variable.

The MAE tables reveal a second clear trend: the MAE values decrease with each successive refinement of the integrated model (INT_EW, INT_DW, INT_EWB, INT_AWB, and INT_OWB). This result is not altogether surprising. One would expect that refinements to a simple equal weighting model (INT_EW) progressively applied using dynamic weighting (INT_DW), bias correction (INT_EWB), adjusted dynamic weighting (INT_AWB), and observation data constraint (INT_OWB) would produce models with increasingly better performance. What is surprising is the magnitude of these improvements. This is especially so when the last refinement is applied by constraining the model with the most current observations (INT_OWB). Table 5 summarizes the POI [calculated by Eq. (12) based on MAE] from INT_OWB in comparison with the average MAE from the three NWP models for CYYZ and CYVR during the winter and summer. In this case, the smallest POI is 24% (wind speed, CYYZ, summer), and the largest POI is 58% (temperature, CYVR, summer).

Third, not only are these accuracy improvements dramatic in size, but their consistency relative to all variables, locations, and seasons are just as remarkable. These large and consistent improvements hold true for all cases of INT_EWB, INT_AWB, and INT_OWB. Even more dramatically, the most refined integrated model (INT_OWB) has the smallest MAE values compared to all other models for all variables, locations, and seasons.

The fourth trend evident from the results is that the MAE values from method 5 of INT_OWB for all variables, locations, and seasons are largely comparable to the observation persistence values for the first 2 h. Table 6 gives the time when NWP and INTW (from method 5) models perform better than persistence. INTW clearly outperforms persistence well before the NWP models. In many cases, the NWP models do not outperform persistence until after 6 h. Figures 1–8 show that the green curve (INT_OWB) tracks at, or just above, the cyan curve (persistence) fairly consistently before the times listed in the columns for INTW.

Another general observation involves the three NWP models. Table 7 lists the NWP models having the smallest MAEs for all combinations of variables, locations, and seasons. The strongest trend evident is that GEM LAM is the most accurate model among the three NWP models for forecasting relative humidity at both CYYZ and CYVR and for both winter and summer. GEM LAM performs better than GEM REG and RUC at CYVR and also seems more consistent at CYVR than at CYYZ. However, RUC seems a better performer for three variables of temperature, wind speed, and gust at CYYZ during the summer season. It is obvious from Table 7 that the performances of NWP models relative to each other vary according to variable, location, and season. None of the NWP models show consistent performance with the exception of GEM LAM forecasting relative humidity. When comparing all models, the INT_OWB model is the best and most consistent performer (with the smallest MAE) among all models regardless of time and location.

In those few cases where the MAE differences between the integrated models and the NWP models are small, it is difficult to ascertain whether their MAE values are significantly different. However, the main purpose of the comparison is to determine if the integrated model results can beat or are better than that of the best NWP model. This study clearly supports such a conclusion.

There are certain advantages to using the most refined algorithms in a nowcast system. A comparison of traditional NWP- and INTW-based nowcasting systems is given in Table 8.

The major strengths of INTW are summarized as follows: 1) overall, INTW produces more accurate results (smaller MAE and higher Heike skill scores) than individual or combined NWP models regardless of location, season, and variable analyzed (HISa,b; IMA); 2) when NWP models show over- or underforecasted bias, INTW can provide better forecasts for the variables; and 3) INTW can increase the possibility of having nowcasts using a system relying on a single NWP model. As long as one NWP model and observations exist, INTW forecasts can still be generated. Finally, there is high confidence for INTW providing better forecasts than NWP forecasts for the first couple of hours.

However, there are certain limitations of INTW. For example, INTW can only be calculated when NWP model(s) and observations are available for processing. INTW can only forecast those variables that exist in both NWP model(s) and observation data. INTW forecast accuracies are affected by NWP model forecast accuracy. The algorithms can deal well with over- or underforecasted bias lasting for certain periods. There is a need to improve algorithms to deal with mixed cases. The current techniques are more effective for some variables (e.g., temperature, relative humidity, wind speed, crosswind, and gust) than other variables (e.g., wind direction, visibility, ceiling, and precipitation rate). The most refined method is good for point forecasts but does not take into account certain issues such as orographic enhancement, convective initiation, or upstream weather.

## 5. Conclusions

This study supports Clemen’s (1989) contention that combining multiple forecasts increases forecast accuracy and even simple averaging may bring dramatic improvements. The development of equations and algorithms applying successive refinements to the initial equal-weighting model yielded increasing improvements in accuracy. An extremely important contributor to this improvement is properly taking into account bias and systematic errors present in models. This must be clearly recognized, especially when applying models to situations for which they were not originally designed, such as in the case of NWP models used for nowcasting. Nevertheless, high-frequency observations and different NWP models are critical for deriving the best integrated forecasts regardless of location, time, and forecast variables analyzed.

While the techniques presented in this study worked very effectively for the four continuous weather variables, they were not very effective when applied to other variables such as wind direction, visibility, and precipitation. A different set of techniques is being developed for these variables and will be the subject of a follow-up paper. The results of this study justify the use of integrated NWP forecasts for nowcasting provided they are properly integrated using appropriate and specifically designed rules and algorithms.

## Acknowledgments

The authors thank the anonymous reviewers for their useful comments on the manuscript. Special thanks to our colleagues Monika Bailey, Stewart Cober, Ismail Gultepe, Paul Joe, Faisal Boudala, Zuohao Cao, Robert Crawford, Ivan Heckman, and Janti Reid for their contributions and valuable discussions, and other colleagues who provide observation data and NWP model products at Environment Canada and NOAA/NCEP. Funding from Environment Canada, Transport Canada, Search and Rescue New Initiatives Fund, NAV CANADA, and NSERC are greatly acknowledged.