## 1. Introduction

The success of hydrological modeling is usually constrained by the rainfall data used (Berne et al. 2004). These constraints are even more important in forecasting mode. It has been found that most of the uncertainty in rainfall-runoff forecasting schemes over small and medium catchments comes from the uncertainty in predicting rainfall at high temporal and spatial resolutions (Zappa et al. 2010).

Different types of quantitative precipitation forecasts (QPFs), such as numerical weather prediction (NWP) rainfall forecasts (Bartholmes and Todini 2005) or blended QPFs (Atencia 2010), have shown benefits in the real-time flood forecasting and warning systems. These methodologies are able to solve the primitive equations of the atmosphere at high spatial and temporal resolution trying to reproduce the rainfall fields at meso-*β* scales. However, it has been proven that the actual solved scales (Surcel et al. 2015), even when data assimilation is introduced in the initial conditions, are larger than those required by hydrological models.

Radar-based heuristic nowcasting techniques have become a key component in hydrometeorological forecasting (Harrison et al. 2012) because these QPFs are characterized by a high initial skill at the required scales for hydrological purposes. However, this skill rapidly decreases with the forecast lead time as shown by Golding (1998). The rapid loss of prediction capability of these heuristic techniques is due to two main sources: growth and decay of precipitation patterns and evolution of the advection field. Several studies [such as Germann et al. (2006)] have shown the case dependency of these two factors.

These errors in the high-resolution forecasts can be introduced into the QPF by probabilistic methodologies. Taking advantage of the multifractal scaling behavior of rainfall fields (Lovejoy and Schertzer 1990; Venugopal et al. 1999) several models such as Mackay et al. (2001) or the “string of beads” model (Pegram and Clothier 2001) have been used to generate a number of realistic future rainfall scenarios (called members) compatible with observations in a nowcasting framework (Berenguer et al. 2011; Bowler et al. 2006). The spread of this ensemble of members where the uncertainties in the evolution of precipitation patterns are simulated can be further narrowed down by imposing statistical properties (Atencia and Zawadzki 2014). However, these probabilistic radar-based heuristic nowcasting methodologies lack some physical properties in the evolution of the rainfall fields.

A new approach for probabilistic quantitative precipitation nowcasting has been devised recently by using analogs (Panziera et al. 2011; Foresti et al. 2015). Analogs represent two atmospheric states closely resembling each other (Lorenz 1969). Each state may then be considered as equivalent to the other state with a reasonably small error. The main hypothesis of analog-based forecasting methods is that the two analog states evolve in similar ways. This analog state, considered to be the “real” analog, will follow the actual meteorological system including all the physical processes (such as local effects or latent heat exchange) that play a role for an observed meteorological situation. The mentioned meteorological system is the real atmosphere instead of an approximation such as the primitive equations used in the NWP models. Consequently, the solved scales are suitable for hydrological purposes.

The main problem of the analog approach is to find analogs good enough to be considered as real analogs. A large enough dataset has to be used to be able to find good analogs for large domains (Van den Dool 1994). A smaller domain can partially solve the problem of the size of the dataset as far as the flow is persistent in the given area (Root et al. 2007). The use of predictors for searching analog situations has, recently, shown that the analog approach can beat both climatology and persistence for medium-range prediction (Obled et al. 2002) and very short-term forecasting (Panziera et al. 2011). Diomede et al. (2006) compared an hourly analog QPF with the forecasted rainfall field obtained by the Bologna limited-area (LAMBO) model. Their results show a large spread among the members of the analog ensemble due to a limited historical dataset. However, a comparison with a state-of-the-art probabilistic nowcasting technique would be useful for a better understanding of the advantages and disadvantages of the analog approach in a very short-term framework.

In this article an analog-based rainfall nowcasting technique is presented. Both a large dataset and physical properties of the rainfall field are used to find a *real* analog, which is a close state in the actual atmospheric phase space. (In fact, this phase space cannot be determined because the large amount of variables or degrees of freedom on it; a large dataset is used to have higher probability to find a close enough state in this unknown phase space.) Section 2 presents the data used in this study and the case studies. In section 3, the methodology for selecting the analogs is introduced, and the indices used for verification are presented. A comparison between the analog technique and a probabilistic radar-based quantitative precipitation nowcasting method is carried out and the results are shown in section 4. The main findings of this study are discussed in section 5.

## 2. Dataset and case studies

A large dataset is required to find appropriate analog states. With this purpose, two composite mosaics over North America have been used. The first one is the National Operational Weather radar (NOWrad) mosaic produced by the Weather Services International (WSI) Corporation. NOWrad is a three-step quality-controlled product with a 15-min temporal resolution and 2-km spatial resolution. These mosaics show the maximum reflectivity measured by any radar at each grid point at any of the 16 vertical levels. These data are available for the period from October 1995 to December 2007. The second dataset is produced by Weather Decision Technologies (WDT) and uses radar data from the entire Weather Surveillance Radar-1988 Doppler (WSR-88D) network in the continental United States (CONUS). This allows WDT to apply their most up-to-date, technologically advanced algorithms to provide superior quality radar data through the removal of false echoes and through the blending of multiple radars. WDT creates seamless radar mosaics with a high spatial resolution (1 km) and temporal resolution (5 min) from January 2004 to April 2011.

A common grid is defined for the two datasets used in this study (Fig. 1). The new 512 × 512 points grid has a 4-km resolution. Each reflectivity map at the original resolution, *Z*. The temporal resolution of the new dataset is 15 min. The selected domain avoids the Rocky Mountains because of their orographic effects on rainfall fields and the blockage they produce in the radar rainfall images.

Four case studies have been selected to test the probabilistic nowcasting techniques. Two of them are high-predictable cases according to the lifetime definition of Germann and Zawadzki (2002, hereafter GZ02). The other two events are low-predictable cases consisting of mesoscale systems. In GZ02 the threshold

Properties and statistics of the four precipitation events used in this study. The extent is defined as the area with a reflectivity higher than 15 dB*Z*. The lifetime of the event is determined by the decorrelation time in Lagrangian coordinates. The fraction of that area with reflectivities higher than 35 dB*Z* defines the percentage of convective precipitation (C. fraction). The values are averages over the entire period.

The data from the North American Regional Reanalysis (NARR) project have been used in this study to characterize the synoptic situation. The synoptic situation will be used in the analog selection technique to avoid selecting analogs with different external forcing due to different synoptic situations. The details of the reanalysis data can be found in Mesinger et al. (2006). These reanalysis products have a 32 km/45 layer resolution with 3-hourly temporal resolution. In this study 20 variables are used. These variables are as follows: geopotential height; pressure vertical velocity; specific humidity; *u*- and *υ*-wind components; temperature for the levels of 925, 850, 750, and 500 hPa; and convective available potential energy (CAPE), potential temperature, and precipitable water for the entire atmosphere and storm relative helicity as surface level variables.

## 3. Methodology

Two different probabilistic methods are compared in this paper. The first one was introduced in a previous contribution (Atencia and Zawadzki 2014). The second technique is based on the analog state theory to obtain an ensemble of plausible future rainfall fields based on an historic dataset of reflectivity fields. The full details of this selection methodology are introduced in section 3a. The indices selected to carry out the comparison of both techniques are introduced in section 3b.

### a. Analog selection

According to Lorenz (1969), analogs are two similar states of the atmosphere, where each state may then be considered as equivalent to the other state plus a small perturbation. The main hypothesis of the analog-based forecast is that these two states will evolve in a similar way because this evolution is caused or driven by the meteorological situation. Ideally, these two states are considered to be analogs if the three-dimensional distribution of weather variables such as temperature, pressure, water vapor, and clouds, and environmental factors such as sea surface or solar heating, are similar. However, the objective of this study is the short-term forecasting of reflectivity fields. Consequently it is necessary (proving that it is also sufficient is beyond the scope of the present paper) that the score used to quantify the similarity between analog states simply measures the differences in the spatial distribution of the reflectivity fields.

This index is selected because of its simplicity and can be related to the lifetime of precipitation patterns as defined by GZ02. Other indices, such as the root-mean-square error (RMSE) and the critical success index (CSI), have also been used, giving similar or less informative results.

One of the main concerns when looking for an analog situation is its existence within the dataset, or alternatively, if the dataset is long enough for the occurrence of a similar situation with the observed one. Van den Dool (1994) stated that a dataset is long enough to find a “good” analog when the distance to the nearest neighbor (DNN) is significantly smaller than the distance to climatology (DC). In Fig. 3 the distance to climatology and the DNN are plotted for the whole year of 1996. It can be observed that the distance has a dependence on the time of the year (i.e., a bigger difference in summer than in winter). However, the DNN is always smaller than the DC and it is significantly different most of the days, concluding that the dataset is long enough for searching for an analog situation.

Techniques like neural networks (NN) or principal component analysis (PCA) have been used as methodologies to carry out the search for analog situations in large datasets (among others, Nishiyama et al. 2007; Foresti et al. 2012). These methodologies divide the whole dataset into groups (similar patterns for NN or variance representation for PCA) accelerating the search of analog situations. Neither of these methodologies are perfect, so to avoid the introduction of errors due to the simplification of the dataset by dividing it into groups, the classification is carried out using the whole dataset every time. This choice avoids the introduction of errors, but as a consequence, the computational time to find the analogs is too long for implementing this technique in real time.

The complete methodology used for choosing the analogs consists of the following three steps:

- Computing the spatial cross correlation of reflectivity fields as a function of separation distance while taking into account geographical factors.
- Computing and comparing the spatially averaged temporal correlation between the current and the previous three reflectivity fields.
- Comparing the similarity of the synoptic situation by computing the Euclidean distance among three meteorological variables.

In addition to the above computations, time of the year (3-month window) and time of the day (

The first step in searching for analogs is based on the comparison of the precipitation pattern by computing the spatial cross correlation (an example is shown in Fig. 5). A cross-correlation value close to 1 means high similarity while values close to 0 indicate completely different patterns in the reflectivity fields. The position of the maximum value in the matrix is equivalent to the displacement required in the analog reflectivity field to overlap the observed field. It is unlikely that two precipitation patterns occurring at distant locations will resemble each other closely. However, if they do, they cannot be expected to vary similarly because the orographic forcing is dissimilar. To take this effect into consideration, a maximum acceptable displacement is allowed depending on the geographical location of the rainfall pattern.

In this study, the maximum acceptable displacement is set to 50 km in the west–east direction and 140 km in the meridional direction (this area is shown in Fig. 5). These distances were chosen in order to take into account the orography in the reflectivity field domain. Figure 6 shows that for a 50-km zonal displacement, and an equivalent correlation value, a 140-km meridional displacement is acceptable. This is mainly due to the fact that CONUS has a north–south-oriented mountain range, as can be seen in Fig. 6b.

The first step in searching for analogs results in a set of analog reflectivity fields based on the spatial distribution of the reflectivity. The next step is to search for a similar motion and evolution of the analog reflectivity field compared to the observed one. For this step, the temporal correlation of the observed rainfall field is studied for an hour (the current rainfall field and the three previous ones). A similar temporal correlation in each interval (it does not have to be high) would ensure a comparable evolution of the rainfall field. However, the temporal correlation cannot guarantee similar motion fields. For this reason, the spatial cross correlation of the three previous reflectivity fields and the latest observed field is computed (Fig. 7). The same cross correlation is computed for the four analog reflectivity fields. A similar maximum cross correlation for the four consecutive images would suggest a comparable temporal correlation and, at the same time, a similar motion of the rainfall field. This is compatible with computing the correlation of the analog reflectivity field and the observed reflectivity field as a function of time as can be seen in Fig. 7.

Finally, the synoptic situations are compared. There are three main physical factors involved in rainfall occurrence: instability, moisture, and forcing. Therefore, three meteorological variables representative of these three factors should be chosen to characterize the synoptic situation. To select these variables, a study of the importance of 20 different variables (a list of all the variables can be found in section 2) has been carried out. The study to determine the best three synoptic variables combination uses each available time for every day for one year. For every 3-h time step each combination of three variables was compared to the remaining 14 years of the dataset in order to find the most similar synoptic state. Once this date and time were found, the correlation of the rainfall patterns was calculated. This resulted in 2920 (365 days, 8 periods a day) rainfall pattern correlations that were then averaged. This procedure was repeated for every 3-set combination resulting in 1140 (this number is the total number of possible combinations of 3 variables over 20 variables used) different averaged correlations. Figure 8 shows the results from this process. The horizontal axis demonstrates the importance of some of the synoptic variables. For example, the PVV 850 hPa was found most frequently for the top

While this study indicated some sensitivity to the choice of variables, temperature at 500 hPa, pressure vertical velocity at 850 hPa, and humidity at 700 hPa were found to have the highest correlation, as can be seen in the zoomed in part of Fig. 8, and therefore best characterize the spatial distribution of rainfall.

In the third step the root-mean-square difference between the observed and analog fields is calculated for these three variables over the NARR model domain to check if the external large-scale forcing is similar for the analogs selected according to the spatial distribution (first step) and the temporal evolution (second step) of the rainfall field. Only the analog members with a root-mean-square difference smaller than half the climatological RMSE are selected in the third step. After applying these steps, an ensemble of analog rainfall states is obtained. The temporal evolution of this ensemble of analog states not only provides a set of feasible future states but also could show the natural spread of the forecasts depending on each meteorological situation (Atencia et al. 2013). This natural spread is related with the intrinsic predictability of the atmospheric state.

The average time to find the analogs for each of the 15-min time stamps in a forecast is around 5 days (this search is done using 8 CPUs during the computation of the correlation for searching the similar spatial pattern analogs). Each observation is compared within a dataset of 3 months range for 7 h in the 15 years of historical data. This is around a total of 39 000 files. Taking into account that in the temporal correlation computation we need to use the three previous files, this is a maximum of 118 000 files. The whole process takes around 3 s for each file. Giving as a result a computational time of 400 000 s or 4.6 days.

### b. Verification of the probabilistic forecasts

The main objective of this contribution is to show the pros and cons of the analog-based nowcasting in comparison with a previously developed probabilistic Lagrangian nowcasting technique (Atencia and Zawadzki 2014). To assess the benefits of introducing uncertainty in Lagrangian forecasts and to compare these purely stochastic errors with the real evolution of rainfall fields in similar situations, the following four scores are computed:

- Cross correlation: The formula for the cross-correlation score was presented in Eq. (1). It is used to compare the pattern of precipitation reproduced by our ensemble generation technique, the Lagrangian extrapolation, and the analog ensemble of reflectivity field with the a posteriori observed reflectivity field. It gives information about the resemblance of the precipitation patterns used in the forecasting mode with the actual precipitation patterns. This score takes a value of 1 for a totally correlated forecast (exactly the same pattern) and a value of 0 for uncorrelated patterns. In GZ02 the threshold
is determined as the lower limit to attribute predictability to a forecast. - Brier score: This score gives information about the magnitude of the probability forecast errors and is defined in Eq. (2). The range of the Brier score is between 0 and 1, a value of 0 indicating a perfect forecast, whereas the higher the Brier score value, the worse the forecast is, as shown here:where
*N*is the total number of points of the domain,is the number of ensemble members, and and *o*stand for the*j*th ensemble member and observation, respectively, and they are either 1 or 0 depending whether an event occurred or not. - ROC area score: This score is the area under the relative operating characteristic (ROC) curve. The ROC is a plot of the probability of detection (POD) versus the false alarm rate (FAR). These two scores are contingency-table scores that are based on dichotomous forecasts (Dobryshman 1972). A dichotomous forecast indicates whether an event has occurred or not at each point of the grid and it is specified by the exceedance of a certain threshold. In this study 15 dB
*Z*(0.3 mm) is the selected threshold. The four combinations of forecasts (yes or no) and observations (yes or no) are hit, miss, false alarm, and correct negative. The two scores used to define the ROC curve are computed asThese indices are computed by using a set of increasing probability thresholds (e.g., 0.05, 0.15, 0.25, etc.) to make the yes/no decision for the ensemble forecast. The ROC area score’s range takes values from 0 to 1, where 1 stands for a perfect forecast and values lower than 0.5 indicate no forecast skill. - The spread of an ensemble of members at a particular forecast time step is measured aswhere
is the total number of points of the domain, is the number of ensemble members, stand for *i*pixel of the*j*th ensemble member, and is the mean of the ensemble members at a given pixel *i*.- According to Whitaker and Loughe (1998), the ensemble spread [Eq. (5)] should be similar to the skill of the ensemble mean measured by the RMSE to ensure a correct representation of the forecast uncertainty. This equality relation comes from the idea that in a perfect ensemble, any of the members should be indistinguishable from the truth and the truth could be one of the members. The way you measure the variability between the members is in terms of the standard deviation (average difference between the members and the mean), and as the truth is equivalent to any of the members, the RMSE of the mean should be the standard deviation.

## 4. Results

The main objective of this article is to compare the skill of two ensemble forecast techniques; one is based on the statistical properties of the image and the radar-based Lagrangian extrapolation and the other one is based on the analog concept. This comparison is carried out in section 4b. Before that, the analog set of reflectivity fields are tested to verify if the selection criteria give a narrow but meaningful ensemble of initial states.

### a. Verification of analogs

It is difficult to test the quality of the selection criteria of the analog techniques as long as it has to be assessed by means of the same index used to select the analog events. However, it can be determined if the rules used to reduce the number of analog events, such as the temporal correlation or the external forcing, select the most meaningful analogs. This analysis can be carried out by comparing the predictive skill (by using the correlation index between an analog member and observation) of the final analog selection with the predictive skill of the initial analog selection (the ones chosen after the first step based on the spatial cross correlation) and, also, with the members chosen after the second step based on the temporal correlation. Figure 9 shows the correlation as a function of the lead time for the four case studies. The different steps in the selection algorithm increase the correlation for all the cases at any lead time. This proves that the selection rules are selecting meaningful analogs. It can be observed that these rules have a small impact for the 0 lead time. This is because the spatial correlation rule (first step in the selection algorithm) is measuring the correlation but the other two rules are not taking into account the best members according to the spatial correlation. It is important to notice that the temporal correlation improves significantly the predictive skill of the final ensemble from the lead time of 1 h. Finally, it can be observed that the improvement due to the comparison of synoptic situation is case dependent. For a mesoscale forced event, such as 22 July, the improvement is almost negligible. On the other hand, for a large-scale event, such as 18 April, the improvement is remarkable and it has a higher impact for later lead times. This can be explained because the time scales of synoptic processes are longer than the ones studied in this paper.

The number of analogs obtained with this technique is not selected by the user but given after applying the three main steps. The first step finds a given number of similar pattern analogs; among these analog members, the ones without a similar temporal evolution are removed. The third step only keeps the events that are synoptically similar. A minimum number of eight analogs are selected to be sure of having enough members to assess the probability at the pixel scale. This number is based on the number of ensemble members used in a similar domain for NWP data assimilation ensemble forecast (Xue et al. 2008). This total number of analog members obtained is always bigger than 8 for the whole forecast period but for only one interval of the last event (22 July 2008). Table 2 shows the average number of analogs selected for each event. The number in brackets represents the maximum and minimum number of members obtained during the whole forecast period.

(from left to right) Average number of members obtained after applying the spatial correlation, temporal correlation, and synoptic forcing. The first number in parentheses stands for the minimum number of analogs obtained and the second for the maximum for the whole event. The number of members is kept for the 10-h lead time.

Once appropriate analog events have been selected, their skill can be measured. The skill is assessed by the correlation between a member and the contemporaneous reflectivity field. Figure 10 shows the correlation for the two highly predictable events (Figs. 10a and 10b) and for the two less predictable events (Figs. 10c and 10d). Figure 10 shows that the quality of the forecast is case dependent but the behavior is similar for similar events, that is, the two high predictable events have a similar skill and the two less predictable cases also. The analog forecasts have a lower correlation (worse skill) than the Lagrangian extrapolation for the first few hours (between 2 and 4 h). This is normal because in a chaotic system like the atmosphere, it is improbable that two observed states are sufficiently close enough to each other at least in fine-grained fashion. Consequently, the intrinsic correlation of the Lagrangian extrapolation with the previous observation is closer than the distance between two analogs states. However, the temporal slope of the skill of the forecast is quite different. Whereas the advection of latest radar field has an exponential slope with decorrelation time as coefficient, the analogs technique has a lower slope at the beginning reaching a more or less constant skill toward the end. This behavior is clearer in the highly predictable events (Figs. 10a,b) where from the second or third hour of forecast the skill is kept between 0.6 and 0.3 for most of the forecasting period. The less predictable events show a different behavior. The slope is less steep than for the Lagrangian extrapolation for the first 8 hours of the event; afterward, the behavior changes. It seems that the diurnal cycle influences the skill of the forecast. When the diurnal heating produces more intense precipitation, lower skill is obtained. The behavior is similar to the high predictable events with a flat slope in the skill after the maximum forcing caused by diurnal heating. These two slopes can be related with different predictability scales. The convective scale predictability has a different slope than the mesoscale predictability. So the slope of the convective scales dominates the slope for the first hours whereas the mesoscale predictability slope takes over for the last hours of these forecasts.

To understand the meaning of the previous values of correlation an example of the probability of rain obtained by the analog-based ensemble forecasting methodology is plotted in Fig. 11. The two examples selected are the most predictable event (18 April) and the least predictable event (22 July). The area with a probability of rain higher than 50% matches reasonably well the actual reflectivity field until 10 h for the first example plotted (Fig. 11a). This is compatible with the correlation values observed in Fig. 10a. On the other hand, the area with probability of rain higher than 25% fits the reflectivity field at a 5-h lead time for the main area of precipitation only. However, the probability of rain does not match with the actual reflectivity field for the 10-h lead time in the less predictable event. This disagreement results in low values of correlation.

The last property of the analog ensemble is the spread and its temporal evolution. This is an important property of probabilistic forecasts for verifying if the uncertainty of the ensemble is a correct representation of the forecast uncertainty. As mentioned previously, the best way to do this verification is to compare the ensemble spread against the RMSE of the ensemble mean. This comparison is showed as a function of the lead time in Fig. 12. It can be observed that during the whole forecast period the ensemble spread has little variation; it is almost constant. Consequently, it can be said that the analogs have a similar evolution (comparing it to the mean of them), which is a desired property for “good” analogs. On the other hand, the root-mean-square of the ensemble mean against the observation slightly increases as a function of the lead time. However, this variation is smaller than the variation in the RMSE obtained by probabilistic Lagrangian forecast [see Fig. 17 in Atencia and Zawadzki (2014)]. It can be observed that during the whole forecasting period both measurements are similar for three out of four events. The last event (22 July) has a larger RMSE than ensemble spread, meaning that the uncertainty of this ensemble forecast is smaller than the required, or in other words, the ensemble spread is underestimated. This can be due to the fact that the analogs obtained for this event have a lower quality (in terms of spatial correlation) compared with the other three events. Despite this underestimation, the difference between RMSE and spread for the analog technique is mainly smaller than the obtained by using stochastic techniques.

### b. Comparison of the two methods

The three scores introduced in section 3b are used to compare the two ensemble generation methods. First, the correlation scores for the two forecasting methods and the Lagrangian extrapolation are presented in Fig. 13. This shows that the deterministic advection (black solid line) falls within the Lagrangian ensemble range most of the time (red shaded area). This is because the basis of these ensembles is the low pass of the deterministic Lagrangian forecast. These results also show the different nature of the chosen events. Figures 13a and 13c show that these events, which consist of widespread stratiform rainfall systems, have obtained significantly higher correlation values than the events associated with isolated and fast-evolving mesoscale forced systems (Figs. 13b,d).

Another important result obtained from Fig. 13 is that analog ensembles have at least one successful member for all the lead times and events. However, the Lagrangian ensembles do not show similar results and they lose predictability for the longer lead times in all the events except 18 April. On the other hand, we can see that the Lagrangian ensembles are better correlated with observed rainfall fields than the analog members from the initial lead time to 3–5 h later. The transition from Lagrangian to analogs as the best forecasting method happens at different lead times and it is related to the type of event (these transition times are summarized in Table 3). The events mostly driven by the advection of frontal-type stratiform rainfall fields and in which growth and decay play a secondary role (e.g., 18 April and 6 June events), have a flatter slope of the correlation function with lead time. The mesoscale forced events, where the growth and decay of precipitation cells plays an important role in the evolution of the rainfall, have a steeper slope. However, the analog members show another behavior. They have a similar slope for the four events and they show less sensitivity to the type of event. For this reason, the analog ensembles outperform the Lagrangian ensembles earlier in the forecast for the mesoscale forced events (Figs. 13b,d) than for the synoptic forced events (Figs. 13a,c). Despite this, forecasting skill for high predictable events is still better than for the less predictable ones. The initial time selected analogs are closer (have a higher correlation) to the observed rainfall field for high predictable events (between 0.6 and 0.5) than for the less predictable events (between 0.5 and 0.4). Consequently, there are analog members with predictability skill for all the events but the number of members with skill is superior for the large-scale forcing events.

Summary of the transition times for the four events and their intrinsic lifetime. The lifetime of the event is determined by the decorrelation time in Lagrangian coordinates. The transition time is computed as the time when the mean of the ensemble of analogous equals the mean of the Lagrangian ensemble for each of the different indices.

Once the members have been studied independently, the probabilistic forecast is verified. The probability forecast is defined by the probability of rainfall occurrence, with a value between 0 and 1. To compare these probabilistic forecasts with the deterministic Lagrangian extrapolation, a value of 1 is given to the reflectivity values greater than 15 dB*Z* and 0 otherwise. The two probabilistic scores introduced in section 3b are used to verify these probabilistic forecasts. The ROC area assesses the ability of the forecast to discriminate between events and nonevents, whereas the Brier score gives an idea of the magnitude of the probability forecast errors.

The pooled ROC areas for the four events as a function of the lead time are plotted in Fig. 14. Values lower than 0.5 stand for no skill probabilistic forecasting. As it was observed from the correlation scores of individual members, the Lagrangian forecasts (both the deterministic and probabilistic ones) outperform the analogs probabilistic forecast for the first few hours. Furthermore, high predictable events (Figs. 14a,c) have better ROC area scores than the events caused by mesoscale forcing (Figs. 14b,d). However, the lead time when the analogs forecasts outperform the ensemble forecast is earlier for large-scale forcing events than for the less predictable ones, contrary to what was obtained for the individual ensemble members through the correlation score. This difference can be caused because the ROC score computes the ability to discriminate between rain and no-rain instead of the similarities between rainfall patterns, which is measured by the cross correlation.

The ROC area for the analog forecasts has values around 0.95 for both 18 April and 6 June, whereas values of 0.9 or even lower are obtained for the other events. Figures 14a and 14c indicate an almost flat slope (or a quasi-constant ROC area value) for all lead times. On the other hand, Figs. 14b and 14d show a steeper slope and the ROC area values fall below 0.8. Finally, it has to be mentioned that even though all the forecasts show skill for all lead times, analog forecasts have a better ability to discriminate between rain/no-rain areas than the Lagrangian ensembles for most of the forecasting period (the exact times within the forecasting period can be found in Table 3).

Once the ability to discriminate between rain/no-rain has been verified, the probability forecast errors are studied using the Brier score. Figure 15 shows this score for the four events. Interquartile ranges (IQ) obtained from the several forecasts used at each lead time are also plotted. As can be deduced from Eq. (2), the Brier score is similar to the mean square error (MSE) using the probability of occurrence instead of the reflectivity value. Consequently, it has a dependence on the coverage of the observed rainfall field. This means that the smaller the rainfall area, the easier it is to get a good Brier score. This behavior can be observed in Fig. 15 where the lowest Brier score (0.15) for the first forecast time (*t* = 15 min) is obtained for the 22 July event (Fig. 15d), which is the event with the smallest coverage (Table 1). Likewise, the highest score (0.26) is obtained for the event of 18 April, which has the largest coverage, according to Table 1.

Focusing on the Lagrangian ensembles, it has to be mentioned that the improvement obtained in the Brier score is related to the smoothness caused by the probabilistic technique applied. The methodology applied to obtain the probabilistic forecast is based on the low-pass extrapolated field plus a perturbation reproducing the variability at small scales. Consequently, even though each ensemble is a plausible realization of the future rainfall field, the probability of occurrence field obtained from these ensembles is similar to the low-pass rainfall field (*Z*_{L}). The threshold used in the low-pass filter to obtain *Z*_{L} is increasing as a function of lead time (because temporal autocorrelation of the Lagrangian extrapolation is decreasing). Probability fields are becoming smoother with lead time and, for this reason, Brier scores for both Lagrangian probabilistic and deterministic forecasts are moving away from each other. Therefore, the distance between these two forecasts provides information about the predictability skill of the large-scale patterns, showing good predictive skill of the large-scale patterns for the high predictable events (Figs. 14d and 15c). We note that the magnitude of errors for the different events shows similar results as obtained with the decorrelation time (Table 1). The increase of Brier score during the forecasting period is 0.03 for the 18 April event [decorrelation time (dt)

Finally, the Brier score as a function of lead time for the analogs’ probabilistic forecasts is studied and compared to the Lagrangian one. Again, the temporal evolution of the probabilistic forecasts is different for the large-scale forced events (Figs. 15a and 15c) than for the mesoscale forced events (Figs. 15b and 15d). The magnitude of the errors in the probability forecasts is almost constant for the high predictable events, whereas it increases notably for the less predictable events. This result agrees with that obtained using the ROC area.

Despite the fact that the Brier score has a dependence on the total rainfall area, it can be used to compare the errors between the Lagrangian and the analogs-based ensembles for the same event. The transition between the optimal technique does not concur with the results obtained with the ROC area. This transition takes place around the second hour of lead time, but it has no relation with the predictability of the events (the exact transition times are presented in Table 3). However, taking into account the IQ range to measure the time when this transition takes place, we see that for the high predictable events the analog-based probabilistic forecast becomes the optimal one after a lead time of 2 h, whereas for the less predictable events, this transition occurs either later (around 6 h for the 22 July event) or it never takes place.

## 5. Conclusions

In this contribution, two ensemble-generation nowcasting techniques have been compared for four precipitation events. The first one is based on the stochastic perturbation of Lagrangian extrapolation of the last observed rainfall field to introduce the uncertainty associated with the growth and decay of the precipitation field and all its details can be found in Atencia and Zawadzki (2014). The second technique, introduced in the present paper, is based on the idea of searching analog situations in a large dataset and using them as members of the forecasting ensemble.

The analog-based technique looks for situations similar to the current one in a large dataset (~15 years of data). As the aim here is rainfall nowcasting (forecast from 0 to few hours), the search is focused on similar precipitation patterns. The correlation coefficient is used to quantify this similarity. The three-step selection criteria developed in this work chooses the long-lasting members among those that have resembling rainfall patterns. In other words, after determining similar states by their spatial pattern similarity two other steps are introduced to discard those members that are not actual analogs when comparing their synoptic forcing or temporal evolution. The real analogs, in contrast with the fake analogs, will behave similarly for a longer period. These two extra steps have discarded some states but, for the four studied events, have retained at least one member with good predictive skill for the entire forecasting period (10 h). Consequently, this technique selects the best analogs possible and the analog rainfall field has a spatial and temporal structure similar to the observed. Even though, it has been shown the dataset is long enough to search for analogs, it is important to emphasize the longer the database, the better the match.

The comparison between the two techniques draws different conclusion. Regarding the individual members, it is observed that the Lagrangian ensembles have correlation higher than the analog for several hours from the start. The time where the optimal forecast technique changes depends on the predictability of the event and this time is around 3–4 h for the less predictable events and more than 6 h for the high predictable events. The comparison shows that the correlation of the Lagrangian ensembles is similar to the exponential slope of the temporal autocorrelation (or the quality of the Lagrangian forecast) whereas the temporal slope of the analogs members shows an almost flat slope. This means that the evolution of these analogs states is quite similar along the whole forecast period.

Regarding the probability of occurrence field, it has been showed that the Lagrangian forecast has better skill (both in terms of the Brier score and in terms of the ROC area) than the analog-based forecast for the first hours. However, the time of transition for the optimal forecast has decreased notably. The time is around 2–3 h for the high predictable events and 3–4 h for the mesoscale forced events. Despite the low correlation of individual members, the information of the joint probability has more physical meaning in the analog-based forecast than just the large-scale extrapolated information used in the Lagrangian forecast. For this reason, it can be concluded that the analog-based probabilistic forecast has better forecasting skill than the stochastic Lagrangian ensemble approach.

However, the analog forecast has to be studied in more depth and using more events to obtain a large dataset evaluation that will provide information to understand other characteristics such as the inherent predictability of the atmosphere. Also, it should be compared to other probabilistic forecasting systems, such as NWP ensembles (with data assimilation to make a fair comparison in a nowcasting framework) to ensure that the evolution of the rainfall field observed in the individual members is similar to that obtained with the NWP models.

The authors express their gratitude to the Global Hydrology and Climate Center (GHRC) for providing access to the WSI radar composites data and the National Weather Service (NWS) for the NARR data. Special thanks go to all our group members for their fruitful discussions during the group meetings and especially to Georgina Paull for their corrections and suggestions to improve the manuscript. Three anonymous reviewers and Geoff Pegram have notably improved this manuscript through their corrections.

## REFERENCES

Atencia, A., 2010: Integración de modelos meteorológicos, hidrológicos y predicción radar para la previsión de crecidas en tiempo real (Integration of NWP, hydrological models and radar-based nowcasting for flood forecasting in real time). Ph.D. thesis, University of Barcelona, 296 pp.

Atencia, A., , and I. Zawadzki, 2014: A comparison of two techniques for generating nowcasting ensembles. Part I: Lagrangian ensemble technique.

,*Mon. Wea. Rev.***142**, 4036–4052, doi:10.1175/MWR-D-13-00117.1.Atencia, A., , I. Zawadzki, , and F. Fabry, 2013: Rainfall attractors and predictability.

*36th Conf. on Radar Meteorology,*Breckenridge, CO, Amer. Meteor. Soc., 15B.5. [Available online at https://ams.confex.com/ams/36Radar/webprogram/Paper228759.html.]Bartholmes, J., , and E. Todini, 2005: Coupling meteorological and hydrological models for flood forecasting.

,*Hydrol. Earth Syst. Sci.***9**, 333–346, doi:10.5194/hess-9-333-2005.Berenguer, M., , D. Sempere-Torres, , and G. Pegram, 2011: SBMcast—An ensemble nowcasting technique to assess the uncertainty in rainfall forecasts by Lagrangian extrapolation.

,*J. Hydrol.***404**, 226–240, doi:10.1016/j.jhydrol.2011.04.033.Berne, A., , G. Delrieu, , J.-D. Creutin, , and C. Obled, 2004: Temporal and spatial resolution of rainfall measurements required for urban hydrology.

,*J. Hydrol.***299**, 166–179, doi:10.1016/S0022-1694(04)00363-4.Bowler, N., , C. Pierce, , and A. Seed, 2006: STEPS: A probabilistic precipitation forecasting scheme which merges an extrapolation nowcast with downscaled NWP.

,*Quart. J. Roy. Meteor. Soc.***132**, 2127–2155, doi:10.1256/qj.04.100.Diomede, T., , F. Nerozzi, , T. Paccagnella, , E. Todini, 2006: The use of meteorological analogues to account for LAM QPF uncertainty.

,*Hydrol. Earth Syst. Sci.***3**, 3061–3097, doi:10.5194/hessd-3-3061-2006.Dobryshman, Y., 1972: Review of forecast verification techniques. World Meteorological Organization, Tech. Note 120, 51 pp.

Foresti, L., , M. Kanevski, , and A. Pozdnoukhov, 2012: Kernel-based mapping of orographic rainfall enhancement in the Swiss Alps as detected by weather radar.

*IEEE Trans. Geosci. Remote Sens.,***50**, 2954–2967, doi:10.1109/TGRS.2011.2179550.Foresti, L., , L. Panziera, , P. V. Mandapaka, , U. Germann, , and A. Seed, 2015: Retrieval of analogue radar images for ensemble nowcasting of orographic rainfall.

,*Meteor. Appl.***22**, 141–155, doi:10.1002/met.1416.Germann, U., , and I. Zawadzki, 2002: Scale-dependence of the predictability of precipitation from continental radar images. Part I: Description of the methodology.

,*Mon. Wea. Rev.***130**, 2859–2873, doi:10.1175/1520-0493(2002)130<2859:SDOTPO>2.0.CO;2.Germann, U., , I. Zawadzki, , and B. Turner, 2006: Predictability of precipitation from continental radar images. Part IV: Limits to prediction.

,*J. Atmos. Sci.***63**, 2092–2108, doi:10.1175/JAS3735.1.Gilleland, E., , D. Ahijevych, , B. Brown, , and E. Ebert, 2010: Verifying forecasts spatially.

,*Bull. Amer. Meteor. Soc.***91**, 1365–1373, doi:10.1175/2010BAMS2819.1.Golding, B., 1998: Nimrod: A system for generating automated very short range forecasts.

,*Meteor. Appl.***5**, 1–16, doi:10.1017/S1350482798000577.Harrison, L., , K. Norman, , C. Pierce, , and N. Gaussiat, 2012: Radar products for hydrological applications in the UK.

*Water Manage.,***165,**89–103, doi:10.1680/wama.2012.165.2.89.Lorenz, E., 1969: Atmospheric predictability as revealed by naturally occurring analogues.

,*J. Atmos. Sci.***26**, 636–646, doi:10.1175/1520-0469(1969)26<636:APARBN>2.0.CO;2.Lovejoy, S., , and D. Schertzer, 1990: Multifractals, universality classes and satellite and radar.

,*J. Geophys. Res.***95**, 2021–2034, doi:10.1029/JD095iD03p02021.Mackay, N., , R. Chandler, , C. Onof, , H. Wheater, 2001: Disaggregation of spatial rainfall fields for hydrological modelling.

,*Hydrol. Earth Syst. Sci.***5**, 165–173, doi:10.5194/hess-5-165-2001.Mesinger, F., and Coauthors, 2006: North American Regional Reanalysis.

,*Bull. Amer. Meteor. Soc.***87**, 343–360, doi:10.1175/BAMS-87-3-343.Nishiyama, K., , S. Endo, , K. Jinno, , C. Bertacchi Uvo, , J. Olsson, , and R. Berndtsson, 2007: Identification of typical synoptic patterns causing heavy rainfall in the rainy season in Japan by a self-organizing map.

,*Atmos. Res.***83**, 185–200, doi:10.1016/j.atmosres.2005.10.015.Obled, C., , G. Bontron, , and R. Garçon, 2002: Quantitative precipitation forecasts: A statistical adaptation of model outputs through an analogues sorting approach.

,*Atmos. Res.***63**, 303–324, doi:10.1016/S0169-8095(02)00038-8.Panziera, L., , U. Germann, , M. Gabella, , and P. Mandapaka, 2011: NORA—Nowcasting of orographic rainfall by means of analogues.

,*Quart. J. Roy. Meteor. Soc.***137**, 2106–2123, doi:10.1002/qj.878.Pegram, G., , and A. Clothier, 2001: High resolution space–time modelling of rainfall: The “string of beads” model.

,*J. Hydrol.***241**, 26–41, doi:10.1016/S0022-1694(00)00373-5.Root, B., , P. Knight, , G. Young, , S. Greybush, , R. Grumm, , R. Holmes, , and J. Ross, 2007: A fingerprinting technique for major weather events.

,*J. Appl. Meteor. Climatol.***46**, 1053–1066, doi:10.1175/JAM2509.1.Surcel, M., , I. Zawadzki, , and M. K. Yau, 2015: A study on the scale dependence of the predictability of precipitation patterns.

*J. Atmos. Sci.,***72,**216–235, doi:10.1175/JAS-D-14-0071.1.Van den Dool, H., 1994: Searching for analogues, how long must we wait?

,*Tellus***46A**, 314–324, doi:10.1034/j.1600-0870.1994.t01-2-00006.x.Venugopal, V., , E. Foufoula-Georgiou, , and V. Sapozhnikov, 1999: Evidence of dynamic scaling in space-time rainfall.

,*J. Geophys. Res.***104**, 31 599–31 610, doi:10.1029/1999JD900437.Whitaker, J. S., , and A. F. Loughe, 1998: The relationship between ensemble spread and ensemble mean skill.

,*Mon. Wea. Rev.***126**, 3292–3302, doi:10.1175/1520-0493(1998)126<3292:TRBESA>2.0.CO;2.Xue, M., and Coauthors, 2008: CAPS realtime storm-scale ensemble and high-resolution forecasts as part of the NOAA Hazardous Weather Testbed 2008 Spring Experiment.

*24th Conf. on Several Local Storms,*Savannah, GA, Amer. Meteor. Soc., 12.2. [Available online at https://ams.confex.com/ams/24SLS/techprogram/paper_142036.htm.]Zappa, M., and Coauthors, 2010: Propagation of uncertainty from observing systems and NWP into hydrological models: COST-731 working group 2.

,*Atmos. Sci. Lett.***11**, 83–91, doi:10.1002/asl.248.Zawadzki, I., 1973: Statistical properties of precipitation patterns.

,*J. Appl. Meteor.***12**, 459–472, doi:10.1175/1520-0450(1973)012<0459:SPOPP>2.0.CO;2.