## 1. Introduction

Feature calibration and alignment (FCA) uses a variational algorithm to partition errors into phase errors, bias (or amplification) errors, and residual small-scale errors (Hoffman and Grassotti 1996). In this pilot study, we have explored the use of FCA in what might be termed a reverse mode to generate a simulated (or pseudo) ensemble of forecasts from a single forecast using forecasts of the global European Centre for Medium-Range Weather Forecasts (ECMWF) model and our previous analysis of the ECMWF FCA statistics (Nehrkorn et al. 2003, hereinafter QJ03). The simulated ensembles were then compared with ECMWF ensemble forecasts of 500-hPa geopotential height over the North American region for the 2003/04 winter season. This method could be applied as well to increase the diversity of an ensemble of dynamical forecasts.

In previous work (Hoffman et al. 1995) we proposed the FCA method to characterize errors for meteorological data analysis and verification. In its simplest form, we decompose forecast error into a part attributable to phase errors and a remainder. The phase error is represented in the same fashion as a velocity field and is required to vary slowly and smoothly with position. FCA is a general method to compare two datasets. In this pilot study, we use FCA in reverse to dress individual dynamical forecasts statistically. Statistical dressing refers to the process of generating random but “statistically reasonable” differences and adding these differences to a single dynamical forecast or to the members of an ensemble of dynamical forecasts to derive a large or larger ensemble of representative forecasts (Roulston and Smith 2003). The term statistically reasonable means the differences could have been drawn from the probability distribution of forecast errors. The method of calculating the random differences and, in particular, the means of representing the forecast errors is key to the process of statistical dressing. For generating pseudoensembles, the application of FCA will fall into two distinct phases. In the training phase we will determine the statistics of the coefficients of the Fourier expansions. Other parsimonious representations, such as wavelets, are also possible. In the application phase we will choose these coefficients randomly but with the correct statistics, thereby generating realistic ensembles of smooth three-dimensional fields of displacement and bias correction. Each realization of the three-dimensional fields of displacement and bias correction will be applied to the single actual forecast to produce a realistic ensemble of pseudoforecasts. A proof-of-concept study applying FCA displacement components only to 500-hPa geopotential heights and based on the datasets and results of QJ03 is described in this paper.

The generation and evaluation of ensemble forecasts have been the subject of numerous recent studies. A brief summary of the generation of ensemble forecasts and their interpretation can be found in Roulston and Smith (2003). Their paper addresses the question of how statistical enhancements or dressing of single deterministic forecasts can be generalized to dressing individual members of dynamical ensembles, using the “best member” method. In this method, the forecast error of the best ensemble member of archived ensemble forecasts is added to the deterministic forecast. Wang and Bishop (2004, 2005) recently suggested an alternative dressing kernel that avoids the difficulties associated with identifying the best ensemble member. The approach of Roulston and Smith (2003) ensures that the generated forecast differences are typical of actual forecast differences, but these differences bear no relationship with the underlying synoptic situation. The FCA-based dressing procedure explored here has the advantage that errors will be associated with synoptic features.

## 2. Background

When comparing a forecast field with an analysis, the human eye can readily distinguish between phase errors and amplitude errors of features such as closed contours or sharp gradients. On the other hand, standard measures of forecast skill, such as rms error, anomaly correlation, or other skill scores, all measure forecast error as the difference between a forecast and an analysis at the same point in space and time. For these measures, relatively minor position and amplitude errors can lead to large mean-square errors. In some cases, displacements of small-scale features in otherwise realistic finescale simulations can result in mean-square errors that are larger than those of coarser-resolution, overly smooth predictions that may be lacking the feature altogether (De Elía and Laprise 2003; White et al. 1999).

*Z*is the forecast and

_{f}*Z*is the verifying analysis, then we can write the total error in terms of these three components as

_{υ}**r**denote position,

*Z*(

_{d}**r**) =

*Z*(

_{f}**r**+

**D**) is the displaced forecast (i.e., the forecast with phase errors corrected) and

*Z*=

_{a}*Z*+

_{d}*B*is the adjusted forecast. The bias and horizontal displacement fields

*B*(

**r**) and

**D**(

**r**) are called the adjustment fields and are to be determined. The FCA decomposition is nonlinear because for each grid point

**r**

*we define the displaced forecast*

_{ij}*Z*as equal to the original forecast at the source location (

_{d}**r**

*+*

_{ij}**D**

*). FCA is general, and what we have called here the verifying analysis could be another forecast or, at the cost of introducing an observation operator, a set of randomly distributed observations.*

_{ij}*J*measures the residual error component and

_{r}*J*measures how unlikely the adjustment fields are.

_{d}The FCA algorithm used here is an efficient variational algorithm that represents the alignment and calibration spectrally. When applied to two-dimensional fields in a local area, such as in the study using integrated water vapor by Hoffman and Grassotti (1996), the displacement and bias correction fields determined by FCA are represented by double Fourier expansions (sines and cosines) in terms of the map coordinates and so are automatically smooth. For a global or hemispheric domain, spherical harmonics are used as the basis functions. Additional smoothness can be obtained by defining *J _{d}* to penalize high wavenumbers more than low wavenumbers in the spectral expansion. We have not yet applied FCA to three-dimensional data, but we anticipate that the vertical shear of the displacement and bias correction fields determined by FCA might also be represented by double Fourier expansions and so might vary linearly with height at each map location in a smooth manner.

*N*grid points and

*σ*is the expected standard deviation of the residual error. Increasing the value of

_{z}*σ*will increase the proportion of error attributed to residual error and will decrease the FCA error components. For the penalty function

_{z}*J*, a simple definition is in terms of the spectral coefficients of the adjustment,

_{d}*C*is the

_{k}*k*th element of the vector of spectral coefficients and

*S*is the estimated standard deviation of the

_{k}*k*th spectral coefficient. The

*S*are tunable parameters. A specific choice to ensure smoothness is to define

_{k}*J*as the domain-integrated squared Laplacian of the adjustment fields. Then, for a Fourier representation, each

_{d}*S*

^{−1}

_{k}is equal to the sum of squares of the wavenumbers associated with the

*k*th spectral coefficient.

QJ03 describe a method to specify the *S _{k}* in a more objective manner by analyzing a historical sample of forecasts and verifying observations or analyses. In that study, the

*S*are determined from the statistics of the FCA solutions found in a stepwise procedure using no constraints but optimally choosing the spectral truncation of the FCA adjustments.

_{k}In this study, we apply the FCA method in reverse: starting from a priori statistics of the spectral coefficients, random realizations of displacements are generated and are used to produce perturbations of a single deterministic forecast.

## 3. Results

### a. Generation of FCA pseudoensembles using low-resolution hemispheric height fields

Displacement and bias correction fields were randomly generated using the FCA statistics available from our previous study of ECMWF forecasts (QJ03). These statistics were generated from Northern Hemisphere 500-hPa geopotential height fields available in spectral form [with a triangular truncation at 40 wavenumbers (T40)], for forecasts from 1 to 10 days. The complex spectral coefficients of the simulated adjustments are generated from random numbers and are scaled so that the rms statistics of the generated coefficients match those of the a priori statistics. We use normally distributed random-number sequences separately for the real and imaginary parts. [See Anderson (1990) for a review of random-number generators.]

A sample of 30 FCA adjustments was generated for each forecast time from 1 to 10 days. This set of adjustments (300 in total for each initial time) is referred to as simulation 1 or “S1” in the following. As a consistency check, the rms statistics of the randomly chosen FCA spectral coefficients were compared with the a priori rms constraints and were found to be in close agreement, aside from differences resulting from the limited sample size. This comparison was done separately for each forecast lead time, from 1 to 10 days. Visual inspection of the gridded S1 FCA fields showed similar scales and magnitudes as the FCA displacement and bias solutions from QJ03. A pseudoensemble of 1–10-day forecasts was generated for a total of 30 forecast initial times by applying these S1 FCA adjustments to the appropriate verifying analysis.

For an in-depth evaluation of the pseudoensembles, we examined individual forecasts at selected grid points in the domain. Examination of individual forecasts revealed strongly overdispersive ensembles at low latitudes. Comparison of rms errors of the actual deterministic forecasts and those of the pseudoensemble demonstrate effect this clearly. Note that, because the S1 ensemble members are obtained by applying the adjustments to the verifying analysis, the S1 forecast error is approximately equal to the ensemble spread. Examination of the FCA solutions at low-latitude grid points showed that the overdispersion was due to the simulated bias adjustment, which had values far in excess of the QJ03 FCA solutions at low latitudes. This result indicates that the simulation strategy is deficient for the bias-adjustment FCA component. A more correct strategy would account for the correlations between the spectral coefficients. Estimating the full covariance matrix of the spectral coefficients would require a much larger sample than was used in QJ03.

In general, biases are very important, but in this pilot study of the 500-hPa geopotential height we are primarily interested in the displacement components of FCA. We therefore performed a simple test in which a new set of pseudoensembles was generated (simulation 2 or “S2”), using the previously generated S1 spectral coefficients except that all bias adjustments were set to zero. The S2 ensemble spread is now much smaller at lower latitudes, and in general the S2 forecast error is much closer to that of the original dynamic forecasts (except that at low latitudes the adjustments are now too small, because the generally small gradients of height there make displacements relatively ineffective at producing height adjustments).

Globally averaged statistics (at all grid points in the Northern Hemisphere) for both S1 and S2 are shown in Table 1. Also shown in Table 1 are the forecast errors of the original forecasts and the error explained by the QJ03 FCA solutions for those forecasts. If the simulated FCA adjustments of the pseudoensembles exactly reproduced the adjustment of the original FCA adjustment, then the pseudoensemble spread would be comparable to the error explained by the original FCA solutions. As was discussed previously, the S1 pseudoensemble overpredicted the ensemble spread because of inappropriate bias-adjustment fields. For S2, the ensemble spread is still slightly larger than either the error explained by the FCA or even the total forecast error. This result is somewhat surprising, because the S2 FCA solutions lack the bias-adjustment component, relying on displacements alone to effect the adjustments. A partial explanation, found in the analysis of the QJ03 results, is that the FCA-determined bias adjustments sometimes act to oppose the effect of the FCA-determined displacements. This phenomenon is absent from both the S1 solutions, because bias adjustment and displacements are assumed to be uncorrelated, and the S2 solutions, because bias adjustments are zero.

A more general approach might be needed in other situations because displacements (or biases) obtained by randomly sampling a population of displacements (or biases) consistent with the FCA error statistics for a forecast will, in general, not be aligned with features of interest in the forecast and thus may underestimate the true variability of the forecast. In these cases, it may be necessary to include a scaling of the error magnitudes by a tunable constant or to include the observed correlation between displacement magnitudes and gradients of the forecast fields in the simulation procedure.

### b. Comparison of FCA and dynamic ensemble forecasts over North America

#### 1) Procedure

Based on the generally successful consistency check for S2, the FCA pseudoensemble technique was compared with a set of ECMWF 51-member, 1–10-day, 500-hPa geopotential height ensemble forecasts archived at Atmospheric and Environmental Research, Inc., for the North American region (0°–90°N, 135°–45°W) and referred to as eCast in the following. For simulation 3, or “S3,” each FCA-generated ensemble consists of the 500-hPa geopotential height from the unperturbed (or control) ECMWF forecast from the operational ECMWF analysis and 50 perturbations. To simulate more closely the operational application of our technique, the pseudoensemble was generated by applying the FCA adjustment to the unperturbed forecast rather than the verifying analysis. In the ideal situation, the FCA perturbations would be based on the FCA statistics of the differences between pairs of ensemble members rather than on the historical forecast errors of the deterministic forecasts, as was analyzed in QJ03.

The 50 FCA solutions for each forecast day from 1 to 10 were generated using the same spectral coefficient statistics as for S2 (i.e., using zero bias adjustments). This set of adjustments (500 in total for each initial time) was applied to higher-resolution (1.5° × 1.5°) 500-hPa geopotential height forecasts over North America for a total of 62 initial times during the period from October 2003 through April 2004. For grid points near the edge of the limited-area domain, displacement vectors can originate from outside the area. In this case, displaced values used the closest boundary points instead. To minimize the effects of these cases and to eliminate extreme low-latitude and high-latitude points, the evaluation of the ensembles in the following discussion is restricted to the subdomain of 36°–66°N by 112.5°–67.5°W.

#### 2) Evaluation at selected grid points

For an in-depth evaluation of the results, the dynamic and pseudoensembles were examined at selected grid points over the North American domain. In our comparison, the truth or verification is taken to be the operational ECMWF analysis. To provide a reference point for the evaluation of the ensembles, two additional pseudoensembles were generated. The first, known as forecast error variance dressing (feVD), is a purely statistical dressing of the control forecast in which 50 normally distributed random numbers are added to the control forecast at each verification point and lead time. The random numbers are generated with a normal or Gaussian distribution with a zero mean and a standard deviation that is equal to the forecast error standard deviation calculated at each verification point. The second, known as climate variance dressing (cVD), is a statistical dressing of climatological values, which follows the same procedure as was used for feVD except that deviations are added to the monthly mean value at each verification point and are scaled by the climatological standard deviation about the mean. We note that the statistical dressing performed here is based only on the forecast error standard deviation and is thus not directly comparable to the statistical dressing proposed by Roulston and Smith (2003) and Wang and Bishop (2005).

A side-by-side comparison of dynamic and pseudoensemble forecasts shows various interesting features. An example of 1–10-day forecasts by the eCast and S3 ensemble forecasts is shown in Fig. 1, alongside those of the reference pseudoensembles feVD and cVD. This case is for 1200 UTC 28 November 2003 at 42°N, 76.5°W (at Sayre, Pennsylvania, midway between New York City, New York, and Buffalo, New York) and was chosen as an example in which S3 and eCast variability are well correlated. In what follows we will refer to the starting time 1200 UTC 28 November 2003 as “Thanksgiving 2003” and the location 42°N, 76.5°W as “Sayre.” In fact, 28 November 2003 is the Friday of Thanksgiving weekend 2003. In this example, the eCast and S3 envelopes are both asymmetrically distributed around the leading-member forecast beyond days 4–6. In this case, for both eCast and S3 the verification falls into a higher probability bin than for the more symmetric statistical dressing forecast feVD for some, but not all, of the forecast times. We can find other cases in which the verification is either outside or inside the distributions of any one of the four ensembles for a significant period. Note that by design the feVD ensemble is always centered on the control forecast and the cVD ensemble is always centered on the climatological value. Further, the uncertainty or spread of the ensemble increases nearly linearly for feVD and stays constant for cVD, with respect to forecast lead time. In contrast, the eCast and S3 ensembles are not centered on any particular forecast and show varying rates of increase of the uncertainty. The ensemble envelope for some days is highly asymmetrical with respect to the unperturbed forecast in both the eCast and S3 ensembles. In other cases, the ensemble envelop can be smaller than for earlier forecast days. There are cases in which the dispersion of the eCast and S3 ensembles agree very well, and there are other cases in which they are very different. Here in Fig. 1 [and in Figs. 3 –5, described in section 3b(3)], a times sign identifies the 12th member of each ensemble. The fact that it is the 12th and not the 14th or some other is not relevant, but identifying a particular ensemble member allows one to see here that the time continuity of all three statistically dressed ensembles is less than that of the eCast ensembles.

In the next two sections, we present a picture of the spatial patterns of the ensemble forecasts for Thanksgiving 2003 and then a more quantitative comparison of the ensemble forecasts, based on a comparison of the ensemble mean and variance fields.

#### 3) Evaluation of spatial patterns

For Thanksgiving 2003, the spatial patterns of the day-3 ensemble mean and variance fields for eCast and S3 are very similar over the whole domain of interest, as illustrated in Fig. 2. Spatial correlations between eCast and S3 for this day are 0.997 for the mean and 0.891 for the variance fields. For both ensembles, the dominant feature is an area of large variance values located at the trough axis; for the S3 ensemble, this area of large variance values extends farther upstream along the jet axis than for the eCast ensemble and its magnitude is larger.

Spatial characteristics of the forecast ensembles can be further illustrated with scatterplots of ensemble forecast values along constant latitude and longitude lines. For Thanksgiving 2003, Fig. 3 shows the values for grid points along 42°N latitude. The increase in ensemble variance as the grid points approach the trough axis is clearly visible for eCast and S3. By comparison, the feVD envelope is nearly constant in size (and much smaller). In this example, the verification nevertheless falls within the feVD ensemble envelope. In Fig. 3 (as well as in Figs. 4 and 5) the 12th member shows that an individual S3 pseudoensemble member has spatial smoothness similar to that of eCast whereas individual feVD and cVD ensemble members are not spatially smooth. A limitation of the FCA ensemble technique as implemented here is illustrated in Fig. 4, which shows the values along 57°N latitude. The S3 ensemble has a sharply defined lower limit along this latitude line, a feature that is absent from the eCast ensemble. Because we limited the FCA to displacements only, perturbed forecasts cannot contain values lower than a nearby local minimum value (or, in converse, larger than nearby maximum values). Another limitation of perturbations generated by displacements only is that in areas with small gradients the size of the perturbations is limited. This phenomenon is illustrated by the scatterplots along 76.5°W longitude (Fig. 5), in which the S3 ensemble spread becomes much smaller than those of eCast toward the northern edge of the domain. Note that in this case, although the forecast variability is wrong, the forecast mean is correct and all of the S3 forecasts are very good. However, the goal of ensemble forecasting is to correctly forecast both the mean and variance. The southern half of Fig. 5, however, clearly illustrates the similarity in the eCast and S3 ensemble spread, except some of the S3 ensemble values at low latitudes are too low. In fact, some S3 ensemble forecasts at low-latitude locations predict extremely low height values, well outside the feVD or even the cVD envelopes. For these same forecasts, the eCast and S3 ensembles may be very similar at other grid points.

A summary plot is shown in Fig. 6, which shows the spatial correlations between the eCast ensemble mean and variance fields and those of the S3, feVD, and cVD pseudoensembles, averaged over all 62 forecasts. All of the correlations of the ensemble mean fields are very high because of the large climatological signal in the 500-hPa geopotential height field. At long forecast lead times, the S3 and feVD ensemble mean forecasts approach the value for cVD, indicating that all three ECMWF-based ensemble means become indistinguishable from climatological values as lead time increases sufficiently. At shorter lead times, the feVD and S3 ensemble mean fields are both very close to that of the dynamic ensemble (eCast), and this similarity remains stronger for S3 than feVD as forecast lead time increases. Comparison of the ensemble variance fields shows a much closer correspondence between S3 and eCast than between either feVD and eCast, or cVD and eCast, for short forecast lead times. By days 6–7, the sample (or time) average of the correlation between eCast and S3 variance fields drops to 0.4 and approaches the value for correlations between eCast and feVD and between eCast and cVD. For eCast at long forecast times, the ensemble members become uncorrelated and the ensemble variance has an expected value of 2 times the climate variance at each latitude and longitude. By design this is also true for feVD and cVD. Therefore, the variance fields for these three ensembles all approach the pattern of climate variance multiplied by two and thus become increasingly spatially correlated.

To illustrate the large day-to-day variability in these statistics, the correlations averaged for all lead times in Fig. 6b are shown as daily values for day-3 forecasts in Fig. 7. Although the sample average of the correlation between eCast and S3 variance fields for day 3 is about 0.5, the daily values vary from near 0.9 to near zero. The success of S3 at capturing the spatial characteristics of the eCast ensemble as seen, for example, in Fig. 2 and more generally in Fig. 6 demonstrates the usefulness of a feature-based statistical description of forecast differences. However, the current examples do not prove the method will work well with meteorological fields other than 500-hPa geopotential heights.

#### 4) Evaluation using ensemble verification metrics

We applied objective verification measures to the dynamic and FCA pseudoensemble forecasts. A common method for evaluating ensemble forecasts is verification rank histograms, also called “Talagrand” diagrams, shown in Fig. 8 for day-3 forecasts at Sayre. A rank histogram is constructed by tallying the number of times the observed value falls into each of the 52 bins defined by an ordering of the 51 ensemble members. If all ensemble members are equally likely, the histogram is flat, with the expected value (and its standard deviation) as indicated by the solid and dashed horizontal lines in the plot, whereas underdispersive or overdispersive ensembles are characterized by U-shaped or dome-shaped histograms, respectively (Hamill and Colucci 1997; Hamill 2001). At most forecast lead times and locations, the verification rank histograms showed generally small deviations from their expected values. In the case shown in Fig. 8, three of the ensembles have rank histograms that do not significantly deviate from their expected values but the S3 Sayre ensemble is an example of an overdispersive ensemble with a characteristic dome-shaped histogram.

*T*is the number of forecasts,

*M*is the number of bins,

*p*is the forecast probability for forecast

_{i,t}*t*that the verification falls in bin

*i*,

*δ*is the Kronecker delta, and

_{ij}*j*is the observed category (verifying bin) for forecast

*t*. The RPS has a range of 0–2, with RPS = 0 for a perfect forecast. Sensitivity tests using 13 and 19 bins showed results that were similar in the relative ranking of the techniques described below. Absolute values of RPS were generally higher (indicating lower skill) for computations with finer bins.

The RPS scores for Sayre are shown in Fig. 9a for all forecast lead times. As is to be expected, the skill of the climatological forecast does not depend on lead time and represents a lower limit of skill. The skill of the dynamical and the other pseudoensembles decreases with lead time. At all lead times, the FCA ensemble (S3) and statistical dressing (feVD) scores are very close, with some indication that feVD appears to be slightly better up to day 5 and equal to or worse than S3 beyond that. However, these differences are not statistically significant.

For an evaluation of the ensemble forecasts at the other verification grid points, RPS scores were aggregated over all points at each latitude. Sample results are shown in Fig. 9b for 42°N latitude, the latitude of Sayre. The error bars shown in the figure assume uncorrelated forecasts and verifications at these points and must be interpreted with caution. The pattern shown in the RPS scores for Sayre is also evident in these aggregated scores. It appears that the RPS is unable to distinguish whether the S3 or feVD ensemble is better.

Evaluation of Talagrand diagrams over multiple grid points cannot be performed by simply aggregating the histograms over multiple grid points, because the correlations between the grid points must be taken into account. Wilks (2004, hereinafter Wil04) and Smith and Hansen (2004) discuss how the method can be generalized to multidimensional data, using minimal spanning tree histograms.

We followed the procedure outlined in Wil04 for computing the minimum spanning tree (MST) rank histograms. The minimum spanning tree length measures the compactness of a collection of points in multidimensional space. In the case of ensemble forecasts for multiple grid points (or variables), each forecast variable and grid point represents one coordinate direction in this space. The MST rank compares the MST length of the *n* ensemble forecasts with the *n* MST lengths obtained by replacing each ensemble member in turn by the verifying analysis. The MST rank histogram, when aggregated over many forecasts, is flat for a situation in which the observation has the same distribution as the ensemble members. For underdispersed or overdispersed ensembles the MST rank histograms are asymmetric, with overpopulated bins at the lowest or highest ranks. The bias must first be removed from the ensemble forecasts, because the MST rank histograms cannot distinguish between the effects of bias and under- or overdispersiveness. For each ensemble, the forecasts were debiased by removing a bias that depends on season [October and November (ON); December, January, and February, (DJF); and March and April (MA)], forecast lead time, and grid point [Eq. (3) of Wil04]. The bias for the eCast and S3 ensembles were very similar, as shown in the example for 3-day forecasts in Fig. 10. All of the 651 grid points within the subdomain defined in the previous section were then used to compute an ensemble mean and covariances about that mean [Eq. (4) of Wil04] for each forecast time period. The deviations of each ensemble member (and the verification) from the ensemble mean were then scaled by the generalized inverse of the full covariance matrix [Eqs. (6)–(7) of Wil04]. The resulting MST rank histograms are shown in Fig. 11 for 5-day forecasts, using full covariance matrix scaling and aggregating results into bins of width 4. The expected values (for the 62-forecast sample) and their expected standard deviations are indicated by the horizontal lines. The histograms are strongly nonuniform for all of the ensemble forecast systems. Aggregating results into bins of width 4 ameliorates problems resulting from the sample size (62). Because our sample is relatively small in comparison with the ensemble size and the size of the covariance matrices estimated, the raw histograms (before aggregation in bins of width 4) are noisy and the expected standard deviations are large. In Fig. 11, it is evident that, while all four ensembles are clearly nonflat, the nonflatness is more pronounced for the feVD and cVD ensembles than for the eCast and S3 ensembles. The situation is similar at the other forecast lead times (not shown). Smith and Hansen (2004) proposed using the results from MST rank histograms such as those in Fig. 11 by analyzing the bin populations of the rank histograms, in particular the deviations from their expected value (measured in terms of the number of standard deviations it is from the expected value). The cumulative probability distribution function of these distributions would follow those of a normal distribution with zero mean and unit standard deviation for a perfect ensemble. The results for the rank histograms shown in Fig. 11 are shown in Fig. 12. In Fig. 12 the curves for the eCast and S3 ensembles tend to fall to the left of those for the feVD and cVD ensembles, indicating that they are closer to an ideal ensemble. The cumulative probability distribution functions based on the raw (bin width 1) histograms (not shown) do not allow a distinction between the ensembles.

It was noted in Wil04 that it is important to consider the full covariance matrix in scaling the deviations from the mean. However, for practical sample sizes the uncertainty of the estimated covariance matrix is large. To test the sensitivity of the results to the specification of the covariance matrix, we also computed MST rank histograms in which the deviations were simply scaled by the standard deviations, which is equivalent to neglecting off-diagonal elements of the covariance matrix [Eq. (5) of Wil04]. The MST rank histograms for this scaling are markedly different (Fig. 13), with larger counts in the higher-ranked bins (uniformly for eCast and S3, and at the extreme bins for feVD and cVD). The corresponding cumulative distribution functions (Fig. 14) are also very different from those using the full covariance matrix scaling. The curves in Fig. 14 would appear to identify eCast and S3 more clearly as superior to feVD and cVD than in Fig. 12.

## 4. Summary and concluding remarks

The results of our pilot study demonstrate that for large-scale features such as 500-hPa geopotential height it is feasible to generate pseudoensembles of weather forecasts from one deterministic forecast and perturbations generated by randomly sampling FCA displacements based on a priori statistics.

Consistency checks using the low-resolution Northern Hemisphere 500-hPa geopotential height forecasts and analyses used in our earlier QJ03 study showed that ensembles with spreads of the approximately correct magnitude can be generated by using the displacement component of FCA alone. Limitations of the technique were noted for low latitudes, where height gradients are small.

The results of the ensemble forecast verifications using the higher-resolution 500-hPa geopotential height forecasts over North America demonstrate that the FCA ensemble technique is performing well at middle latitudes. Comparison with actual dynamical ensembles generated by ECMWF showed that important features of the dynamical ensemble can be approximated by the FCA pseudoensemble. Of particular interest is that, at short to intermediate forecast lead times, coherent horizontal structures of the ensemble variance field are similar for the dynamic and FCA ensembles. This condition could not be achieved with a simple statistical dressing of the unperturbed ECMWF forecast. Some limitations of the FCA ensembles were found to be caused by the use of displacement-only perturbations, which limits the spread of the ensemble near local extrema and generally in areas with small gradients. We discuss mitigation approaches for this issue below.

For the pilot study data examined here, verification of the ensemble forecasts using ranked probability scores and Talagrand diagrams does not indicate that the FCA ensemble verifies consistently better than a simple statistical dressing of the ECMWF forecast. An analysis using minimum spanning tree histograms, however, showed some indications that eCast and S3 are superior to the feVD and cVD ensembles. The lack of differentiation in terms of the RPS score may partially be caused by the neglect of the bias component and of the correlations between FCA spectral coefficients when generating the FCA ensembles. Other limitations specific to the pilot study data are 1) that the FCA adjustments were based on statistics derived from different forecast model versions (1989/90 vs 2003/04), at lower resolution, and 2) the assumption that forecast error statistics are equivalent to forecast difference statistics.

We note several potential refinements to the generation and applications of the FCA adjustments that might improve the FCA ensemble performance. The bias and random FCA components should be used as well as the displacement component. Much larger samples would allow estimating correlations between spectral coefficients and between FCA components and features in the meteorological fields. Larger samples would also permit more definitive statistical analysis of the results. FCA error statistics might usefully be stratified by season, geography, or synoptic situation. The dressing kernel suggested by Wang and Bishop (2005) could be used to treat the residual variance not captured by the FCA ensemble.

One possible use of FCA-generated ensembles is in dispersion modeling. In some common cases, the misplacement of large-scale features such as a cold front can give rise to a large part of the uncertainty of NWP forecast parameters of interest to dispersion modeling, including stability and wind speed and direction. Such probabilistic NWP information is needed for the generation of critical forecasts of dosage resulting from releases of chemical, biological, or radiological agents. For example, uncertainty information is used to produce probabilistic dispersion forecasts in the Second-Order Integrated Puff Model (SCIPUFF; Sykes et al. 1998), the dispersion module of the Hazard Prediction and Assessment Capability (HPAC) used by the Defense Threat Reduction Agency. Within SCIPUFF, subgrid-scale variability is modeled internally through a turbulence parameterization and larger-scale (synoptic/mesoscale) variability is specified in terms of standard deviations and a length scale parameter. Although designed to represent the variability resulting from scales not represented by the input meteorological data, these parameters can also be used to represent the variability resulting from the uncertainty of the input data.

With increased processing power, the forecast uncertainty information may be provided by using ensembles of forecasts. However, when speed and efficiency requirements are stringent, for example, to support real-time applications of HPAC in the field, statistical ensemble dressing techniques may prove useful. For this purpose one would derive offline difference statistics for ensembles of high-resolution mesoscale forecasts in terms of the components of FCA and then would use these statistics in near–real time to provide ensemble input for HPAC or to define the variability parameters used as input to HPAC as a function of time and position. For typical mesoscale applications, there are only sufficient verification data at the surface and some assumptions must be introduced to determine the three-dimensional FCA fields. The simplest assumption is that there is no vertical variation. As an alternative, FCA training could be accomplished using three-dimensional difference fields between pairs of forecasts, with the results then scaled so as to obtain agreement at the surface with differences between forecasts and observations.

## Acknowledgments

This research was supported by Defense Threat Reduction Agency Contract HDTRA1-05-P-0031. The authors thank their colleagues Steven Hanna and James Hansen for helpful discussions and our contract monitor Stephanie Hamilton for continued support. We thank the NASA Goddard Data Assimilation Office and ECMWF for providing the Lorenz datasets. A detailed constructive review by an anonymous reviewer was very helpful.

## REFERENCES

Anderson, S. L., 1990: Random number generators on vector supercomputers and other advanced architectures.

,*SIAM Rev.***32****,**221–251.De Elía, R., and R. Laprise, 2003: Distribution-oriented verification of limited-area model forecasts in a perfect-model framework.

,*Mon. Wea. Rev.***131****,**2492–2509.Hamill, T. M., 2001: Interpretation of rank histograms for verifying ensemble forecasts.

,*Mon. Wea. Rev.***129****,**550–560.Hamill, T. M., and S. J. Colucci, 1997: Verification of Eta-RSM short-range ensemble forecasts.

,*Mon. Wea. Rev.***125****,**1312–1327.Hoffman, R. N., and C. Grassotti, 1996: A technique for assimilating SSM/I observations of marine atmospheric storms.

,*J. Appl. Meteor.***35****,**1177–1188.Hoffman, R. N., Z. Liu, J-F. Louis, and C. Grassotti, 1995: Distortion representation of forecast errors.

,*Mon. Wea. Rev.***123****,**2758–2770.Nehrkorn, T., R. N. Hoffman, C. Grassotti, and J-F. Louis, 2003: Feature calibration and alignment to represent model forecast errors: Empirical regularization.

,*Quart. J. Roy. Meteor. Soc.***129****,**195–218.Roulston, M. S., and L. A. Smith, 2003: Combining dynamical and statistical ensembles.

,*Tellus***55A****,**16–30.Smith, L. A., and J. A. Hansen, 2004: Extending the limits of ensemble forecast verification with the minimum spanning tree.

,*Mon. Wea. Rev.***132****,**1522–1528.Sykes, R. I., S. F. Parker, D. S. Henn, C. P. Cerasoli, and L. P. Santos, 1998: PC-SCIPUFF version 1.2PD, technical documentation. Titan Research and Technology Division, Titan Corporation, ARAP Rep. 718, 172 pp. [Available online at http://www.titan.com.].

Wang, X., and C. H. Bishop, 2004: Ensemble augmentation with a new dressing kernel. Preprints,

*20th Conf. on Weather Analysis and Forecasting/16th Conf. on Numerical Weather Prediction*, Seattle, WA, Amer. Meteor. Soc., CD-ROM, J6.4.Wang, X., and C. H. Bishop, 2005: Improvement of ensemble reliability with a new dressing kernel.

,*Quart. J. Roy. Meteor. Soc.***131****,**965–986.White, B. G., J. Paegle, W. J. Steenburgh, J. D. Horel, R. T. Swanson, L. K. Cook, D. J. Onton, and J. G. Miles, 1999: Short-term forecast validation of six models.

,*Wea. Forecasting***14****,**84–108.Wilks, D. S., 2004: The minimum spanning tree histogram as a verification tool for multidimensional ensemble forecasts.

,*Mon. Wea. Rev.***132****,**1329–1340.

Ensemble (a) mean and (b) variance for eCast and (c) mean and (d) variance for S3 for the 3-day 500-hPa geopotential height forecast from 1200 UTC 28 Nov 2003. The latitude and longitude lines corresponding to the ordinate of Figs. 3 –5 are indicated by dashed lines, and the location of Sayre is marked with an open circle. The ensemble mean 5400-m height contour is highlighted and is shown in both the ensemble mean and variance plots.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Ensemble (a) mean and (b) variance for eCast and (c) mean and (d) variance for S3 for the 3-day 500-hPa geopotential height forecast from 1200 UTC 28 Nov 2003. The latitude and longitude lines corresponding to the ordinate of Figs. 3 –5 are indicated by dashed lines, and the location of Sayre is marked with an open circle. The ensemble mean 5400-m height contour is highlighted and is shown in both the ensemble mean and variance plots.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Ensemble (a) mean and (b) variance for eCast and (c) mean and (d) variance for S3 for the 3-day 500-hPa geopotential height forecast from 1200 UTC 28 Nov 2003. The latitude and longitude lines corresponding to the ordinate of Figs. 3 –5 are indicated by dashed lines, and the location of Sayre is marked with an open circle. The ensemble mean 5400-m height contour is highlighted and is shown in both the ensemble mean and variance plots.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Ensemble forecast scatterplots along constant latitude line 42°N for (a) eCast, (b) S3, (c) feVD, and (d) cVD for the 3-day 500-hPa geopotential height forecast from 1200 UTC 28 Nov 2003. Special symbols are used to highlight the unperturbed forecast (open circles), the verification (triangles), and one particular ensemble member (times signs). The horizontal axes are labeled by gridpoint number along the line (going from west to east). The separation between grid points is 1.5°. The location of Sayre is indicated by the arrow.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Ensemble forecast scatterplots along constant latitude line 42°N for (a) eCast, (b) S3, (c) feVD, and (d) cVD for the 3-day 500-hPa geopotential height forecast from 1200 UTC 28 Nov 2003. Special symbols are used to highlight the unperturbed forecast (open circles), the verification (triangles), and one particular ensemble member (times signs). The horizontal axes are labeled by gridpoint number along the line (going from west to east). The separation between grid points is 1.5°. The location of Sayre is indicated by the arrow.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Ensemble forecast scatterplots along constant latitude line 42°N for (a) eCast, (b) S3, (c) feVD, and (d) cVD for the 3-day 500-hPa geopotential height forecast from 1200 UTC 28 Nov 2003. Special symbols are used to highlight the unperturbed forecast (open circles), the verification (triangles), and one particular ensemble member (times signs). The horizontal axes are labeled by gridpoint number along the line (going from west to east). The separation between grid points is 1.5°. The location of Sayre is indicated by the arrow.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

As in Fig. 3, but for 57°N latitude for (a) eCast and (b) S3 only.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

As in Fig. 3, but for 57°N latitude for (a) eCast and (b) S3 only.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

As in Fig. 3, but for 57°N latitude for (a) eCast and (b) S3 only.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

As in Fig. 3, but for 76.5°W longitude for (a) eCast and (b) S3 only.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

As in Fig. 3, but for 76.5°W longitude for (a) eCast and (b) S3 only.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

As in Fig. 3, but for 76.5°W longitude for (a) eCast and (b) S3 only.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Time-averaged spatial correlation between eCast and other ensemble (a) mean and (b) variance fields for the 1–10-day 500-hPa geopotential height forecasts. The eCast correlations with S3 (open circles), feVD (triangles), and cVD (times signs) are calculated over the North American subdomain and then averaged over all 62 forecasts.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Time-averaged spatial correlation between eCast and other ensemble (a) mean and (b) variance fields for the 1–10-day 500-hPa geopotential height forecasts. The eCast correlations with S3 (open circles), feVD (triangles), and cVD (times signs) are calculated over the North American subdomain and then averaged over all 62 forecasts.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Time-averaged spatial correlation between eCast and other ensemble (a) mean and (b) variance fields for the 1–10-day 500-hPa geopotential height forecasts. The eCast correlations with S3 (open circles), feVD (triangles), and cVD (times signs) are calculated over the North American subdomain and then averaged over all 62 forecasts.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Spatial correlation between eCast and other ensemble variance fields for the 3-day 500-hPa geopotential height forecasts. The eCast correlations with S3 (open circles), feVD (triangles), and cVD (times signs) are shown for all 62 forecasts. The Thanksgiving case is indicated by the vertical dashed line.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Spatial correlation between eCast and other ensemble variance fields for the 3-day 500-hPa geopotential height forecasts. The eCast correlations with S3 (open circles), feVD (triangles), and cVD (times signs) are shown for all 62 forecasts. The Thanksgiving case is indicated by the vertical dashed line.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Spatial correlation between eCast and other ensemble variance fields for the 3-day 500-hPa geopotential height forecasts. The eCast correlations with S3 (open circles), feVD (triangles), and cVD (times signs) are shown for all 62 forecasts. The Thanksgiving case is indicated by the vertical dashed line.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Verification rank histograms for (a) eCast, (b) S3, (c) feVD, and (d) cVD for the 3-day 500-hPa geopotential height ensemble forecasts at Sayre. The 52 raw ranking values are aggregated to bins of width 4 for graphical display. Expected values for a flat histogram are shown by the solid horizontal line, and the expected standard deviations are shown by the dashed horizontal lines.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Verification rank histograms for (a) eCast, (b) S3, (c) feVD, and (d) cVD for the 3-day 500-hPa geopotential height ensemble forecasts at Sayre. The 52 raw ranking values are aggregated to bins of width 4 for graphical display. Expected values for a flat histogram are shown by the solid horizontal line, and the expected standard deviations are shown by the dashed horizontal lines.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Verification rank histograms for (a) eCast, (b) S3, (c) feVD, and (d) cVD for the 3-day 500-hPa geopotential height ensemble forecasts at Sayre. The 52 raw ranking values are aggregated to bins of width 4 for graphical display. Expected values for a flat histogram are shown by the solid horizontal line, and the expected standard deviations are shown by the dashed horizontal lines.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Ranked probability scores as a function of forecast day (a) for Sayre and (b) for all grid points at 42°N latitude (the latitude of Sayre) for eCast (open diamonds), S3 (open circles), feVD (open triangles), and cVD (times signs) 500-hPa geopotential height ensemble forecasts. Error bars represent one standard deviation from a bootstrap estimate using 250 realizations.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Ranked probability scores as a function of forecast day (a) for Sayre and (b) for all grid points at 42°N latitude (the latitude of Sayre) for eCast (open diamonds), S3 (open circles), feVD (open triangles), and cVD (times signs) 500-hPa geopotential height ensemble forecasts. Error bars represent one standard deviation from a bootstrap estimate using 250 realizations.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Ranked probability scores as a function of forecast day (a) for Sayre and (b) for all grid points at 42°N latitude (the latitude of Sayre) for eCast (open diamonds), S3 (open circles), feVD (open triangles), and cVD (times signs) 500-hPa geopotential height ensemble forecasts. Error bars represent one standard deviation from a bootstrap estimate using 250 realizations.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Ensemble mean bias of the (a) eCast and (b) S3 for the 3-day winter 500-hPa geopotential height forecasts.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Ensemble mean bias of the (a) eCast and (b) S3 for the 3-day winter 500-hPa geopotential height forecasts.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Ensemble mean bias of the (a) eCast and (b) S3 for the 3-day winter 500-hPa geopotential height forecasts.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

MST rank histograms for (a) eCast, (b) S3, (c) feVD, and (d) cVD for the 3-day 500-hPa geopotential height ensemble forecasts. Counts are shown for bins of width 4 and full covariance matrix scaling.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

MST rank histograms for (a) eCast, (b) S3, (c) feVD, and (d) cVD for the 3-day 500-hPa geopotential height ensemble forecasts. Counts are shown for bins of width 4 and full covariance matrix scaling.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

MST rank histograms for (a) eCast, (b) S3, (c) feVD, and (d) cVD for the 3-day 500-hPa geopotential height ensemble forecasts. Counts are shown for bins of width 4 and full covariance matrix scaling.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Cumulative probability distribution functions for eCast (open diamonds), S3 (open circles), feVD (open triangles), and cVD (times signs) derived from the MST rank histograms of Fig. 11.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Cumulative probability distribution functions for eCast (open diamonds), S3 (open circles), feVD (open triangles), and cVD (times signs) derived from the MST rank histograms of Fig. 11.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Cumulative probability distribution functions for eCast (open diamonds), S3 (open circles), feVD (open triangles), and cVD (times signs) derived from the MST rank histograms of Fig. 11.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

MST rank histograms for (a) eCast, (b) S3, (c) feVD, and (d) cVD as in Fig. 11, but using diagonal covariance matrix scaling.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

MST rank histograms for (a) eCast, (b) S3, (c) feVD, and (d) cVD as in Fig. 11, but using diagonal covariance matrix scaling.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

MST rank histograms for (a) eCast, (b) S3, (c) feVD, and (d) cVD as in Fig. 11, but using diagonal covariance matrix scaling.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Cumulative probability distribution functions for eCast (open diamonds), S3 (open circles), feVD (open triangles), and cVD (times signs) derived from the MST rank histograms of Fig. 13.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Cumulative probability distribution functions for eCast (open diamonds), S3 (open circles), feVD (open triangles), and cVD (times signs) derived from the MST rank histograms of Fig. 13.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Cumulative probability distribution functions for eCast (open diamonds), S3 (open circles), feVD (open triangles), and cVD (times signs) derived from the MST rank histograms of Fig. 13.

Citation: Journal of Applied Meteorology and Climatology 45, 11; 10.1175/JAM2428.1

Globally averaged rms statistics of 500-hPa height errors (m) averaged over all 30 forecasts and, in the case of S1 and S2, over all ensemble members. Column “fhr” is the forecast hour, “err” is the error of the deterministic forecasts, and “FCA” is the error explained by the FCA for those forecasts.