# Search Results

## You are looking at 1 - 10 of 12 items for

- Author or Editor: Elizabeth Satterfield x

- Refine by Access: All Content x

## Abstract

The ability of an ensemble to capture the magnitude and spectrum of uncertainty in a local linear space spanned by the ensemble perturbations is assessed. Numerical experiments are carried out with a reduced resolution 2004 version of the model component of the National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS). The local ensemble transform Kalman filter (LETKF) data assimilation system is used to assimilate observations in three steps, gradually adding more realistic features to the observing network. In the first experiment, randomly placed, noisy, simulated vertical soundings, which provide 10%% coverage of horizontal model grid points, are assimilated. Next, the impact of an inhomogeneous observing system is introduced by assimilating simulated observations in the locations of real observations of the atmosphere. Finally, observations of the real atmosphere are assimilated.

The most important findings of this study are the following: predicting the magnitude of the forecast uncertainty and the relative importance of the different patterns of uncertainty is, in general, a more difficult task than predicting the patterns of uncertainty; the ensemble, which is tuned to provide near-optimal performance at analysis time, underestimates not only the total magnitude of the uncertainty, but also the magnitude of the uncertainty that projects onto the space spanned by the ensemble perturbations; and finally, a strong predictive linear relationship is found between the local ensemble spread and the upper bound of the local forecast uncertainty.

## Abstract

The ability of an ensemble to capture the magnitude and spectrum of uncertainty in a local linear space spanned by the ensemble perturbations is assessed. Numerical experiments are carried out with a reduced resolution 2004 version of the model component of the National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS). The local ensemble transform Kalman filter (LETKF) data assimilation system is used to assimilate observations in three steps, gradually adding more realistic features to the observing network. In the first experiment, randomly placed, noisy, simulated vertical soundings, which provide 10%% coverage of horizontal model grid points, are assimilated. Next, the impact of an inhomogeneous observing system is introduced by assimilating simulated observations in the locations of real observations of the atmosphere. Finally, observations of the real atmosphere are assimilated.

The most important findings of this study are the following: predicting the magnitude of the forecast uncertainty and the relative importance of the different patterns of uncertainty is, in general, a more difficult task than predicting the patterns of uncertainty; the ensemble, which is tuned to provide near-optimal performance at analysis time, underestimates not only the total magnitude of the uncertainty, but also the magnitude of the uncertainty that projects onto the space spanned by the ensemble perturbations; and finally, a strong predictive linear relationship is found between the local ensemble spread and the upper bound of the local forecast uncertainty.

## Abstract

The performance of an ensemble prediction system is inherently flow dependent. This paper investigates the flow dependence of the ensemble performance with the help of linear diagnostics applied to the ensemble perturbations in a small local neighborhood of each model gridpoint location ℓ. A local error covariance matrix 𝗣_{ℓ} is defined for each local region, and the diagnostics are applied to the linear space _{ℓ}. The particular diagnostics are chosen to help investigate the efficiency of

Numerical experiments are carried out with an implementation of the local ensemble transform Kalman filter (LETKF) data assimilation system on a reduced-resolution [T62 and 28 vertical levels (T62L28)] version of the National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS). Both simulated observations under the perfect model scenario and observations of the real atmosphere in a realistic setting are used in these experiments. It is found that (i) paradoxically, the linear space *E* dimension) is a reliable predictor of the performance of

## Abstract

The performance of an ensemble prediction system is inherently flow dependent. This paper investigates the flow dependence of the ensemble performance with the help of linear diagnostics applied to the ensemble perturbations in a small local neighborhood of each model gridpoint location ℓ. A local error covariance matrix 𝗣_{ℓ} is defined for each local region, and the diagnostics are applied to the linear space _{ℓ}. The particular diagnostics are chosen to help investigate the efficiency of

Numerical experiments are carried out with an implementation of the local ensemble transform Kalman filter (LETKF) data assimilation system on a reduced-resolution [T62 and 28 vertical levels (T62L28)] version of the National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS). Both simulated observations under the perfect model scenario and observations of the real atmosphere in a realistic setting are used in these experiments. It is found that (i) paradoxically, the linear space *E* dimension) is a reliable predictor of the performance of

## Abstract

Ensemble variances provide a prediction of the flow-dependent error variance of the ensemble mean or, possibly, a high-resolution forecast. However, small ensemble size, unaccounted for model error, and imperfections in ensemble generation schemes cause the predictions of error variance to be imperfect. In previous work, the authors developed an analytic approximation to the posterior distribution of true error variances, given an imperfect ensemble prediction, based on parameters recovered from long archives of innovation and ensemble variance pairs. This paper shows how heteroscedastic postprocessing enables climatological information to be blended with ensemble forecast information when information about the distribution of true error variances given an ensemble sample variance is available. A hierarchy of postprocessing methods are described, each graded on the amount of information about the posterior distribution of error variances used in the postprocessing. These homoscedastic methods are used to assess the value of knowledge of the mean and variance of the posterior distribution of error variances to ensemble postprocessing and explore sensitivity to various parameter regimes. Testing was performed using both synthetic data and operational ensemble forecasts of a Gaussian-distributed variable, to provide a proof-of-concept demonstration in a semi-idealized framework. Rank frequency histograms, weather roulette, continuous ranked probability score, and spread-skill diagrams are used to quantify the value of information about the posterior distribution of error variances. It is found that ensemble postprocessing schemes that utilize the full distribution of error variances given the ensemble sample variance outperform those that do not.

## Abstract

Ensemble variances provide a prediction of the flow-dependent error variance of the ensemble mean or, possibly, a high-resolution forecast. However, small ensemble size, unaccounted for model error, and imperfections in ensemble generation schemes cause the predictions of error variance to be imperfect. In previous work, the authors developed an analytic approximation to the posterior distribution of true error variances, given an imperfect ensemble prediction, based on parameters recovered from long archives of innovation and ensemble variance pairs. This paper shows how heteroscedastic postprocessing enables climatological information to be blended with ensemble forecast information when information about the distribution of true error variances given an ensemble sample variance is available. A hierarchy of postprocessing methods are described, each graded on the amount of information about the posterior distribution of error variances used in the postprocessing. These homoscedastic methods are used to assess the value of knowledge of the mean and variance of the posterior distribution of error variances to ensemble postprocessing and explore sensitivity to various parameter regimes. Testing was performed using both synthetic data and operational ensemble forecasts of a Gaussian-distributed variable, to provide a proof-of-concept demonstration in a semi-idealized framework. Rank frequency histograms, weather roulette, continuous ranked probability score, and spread-skill diagrams are used to quantify the value of information about the posterior distribution of error variances. It is found that ensemble postprocessing schemes that utilize the full distribution of error variances given the ensemble sample variance outperform those that do not.

## Abstract

A conundrum of predictability research is that while the prediction of flow-dependent error distributions is one of its main foci, chaos fundamentally hides flow-dependent forecast error distributions from empirical observation. Empirical estimation of such error distributions requires a large sample of error realizations *given the same flow-dependent conditions*. However, chaotic elements of the flow and the observing network make it impossible to collect a large enough conditioned error sample to empirically define such distributions and their variance. Such conditional variances are “hidden.” Here, an exposition of the problem is developed from an ensemble Kalman filter data assimilation system applied to a 10-variable nonlinear chaotic model and 25 000 replicate models. The 25 000 replicates *reveal* the error variances that would otherwise be hidden. It is found that the inverse-gamma distribution accurately approximates the *posterior* distribution of conditional error variances *given an imperfect ensemble variance* and provides a reasonable approximation to the *prior* climatological distribution of conditional error variances. A new analytical model shows how the properties of a *likelihood* distribution of ensemble variances given a true conditional error variance determine the *posterior* distribution of error variances given an ensemble variance. The analytically generated distributions are shown to satisfactorily fit empirically determined distributions. The theoretical analysis yields a rigorous interpretation and justification of hybrid error variance models that linearly combine static and flow-dependent estimates of forecast error variance; in doing so, it also helps justify and inform hybrid error covariance models.

## Abstract

A conundrum of predictability research is that while the prediction of flow-dependent error distributions is one of its main foci, chaos fundamentally hides flow-dependent forecast error distributions from empirical observation. Empirical estimation of such error distributions requires a large sample of error realizations *given the same flow-dependent conditions*. However, chaotic elements of the flow and the observing network make it impossible to collect a large enough conditioned error sample to empirically define such distributions and their variance. Such conditional variances are “hidden.” Here, an exposition of the problem is developed from an ensemble Kalman filter data assimilation system applied to a 10-variable nonlinear chaotic model and 25 000 replicate models. The 25 000 replicates *reveal* the error variances that would otherwise be hidden. It is found that the inverse-gamma distribution accurately approximates the *posterior* distribution of conditional error variances *given an imperfect ensemble variance* and provides a reasonable approximation to the *prior* climatological distribution of conditional error variances. A new analytical model shows how the properties of a *likelihood* distribution of ensemble variances given a true conditional error variance determine the *posterior* distribution of error variances given an ensemble variance. The analytically generated distributions are shown to satisfactorily fit empirically determined distributions. The theoretical analysis yields a rigorous interpretation and justification of hybrid error variance models that linearly combine static and flow-dependent estimates of forecast error variance; in doing so, it also helps justify and inform hybrid error covariance models.

## Abstract

The statistics of model temporal variability ought to be the same as those of the filtered version of reality that the model is designed to represent. Here, simple diagnostics are introduced to quantify temporal variability on different time scales and are then applied to NCEP and CMC global ensemble forecasting systems. These diagnostics enable comparison of temporal variability in forecasts with temporal variability in the initial states from which the forecasts are produced. They also allow for an examination of how day-to-day variability in the forecast model changes as forecast integration time increases. Because the error in subsequent analyses will differ, it is shown that forecast temporal variability should lie between corresponding analysis variability and analysis variability minus 2 times the analysis error variance. This expectation is not always met and possible causes are discussed. The day-to-day variability in NCEP forecasts steadily decreases at a slow rate as forecast time increases. In contrast, temporal variability increases during the first few days in the CMC control forecasts, and then levels off, consistent with a spinup of the forecasts starting from overly smoothed analyses. The diagnostics successfully reflect a reduction in the temporal variability of the CMC perturbed forecasts after a system upgrade. The diagnostics also illustrate a shift in variability maxima from storm-track regions for 1-day variability to blocking regions for 10-day variability. While these patterns are consistent with previous studies examining temporal variability on different time scales, they have the advantage of being obtainable without the need for extended (e.g., multimonth) forecast integrations.

## Abstract

The statistics of model temporal variability ought to be the same as those of the filtered version of reality that the model is designed to represent. Here, simple diagnostics are introduced to quantify temporal variability on different time scales and are then applied to NCEP and CMC global ensemble forecasting systems. These diagnostics enable comparison of temporal variability in forecasts with temporal variability in the initial states from which the forecasts are produced. They also allow for an examination of how day-to-day variability in the forecast model changes as forecast integration time increases. Because the error in subsequent analyses will differ, it is shown that forecast temporal variability should lie between corresponding analysis variability and analysis variability minus 2 times the analysis error variance. This expectation is not always met and possible causes are discussed. The day-to-day variability in NCEP forecasts steadily decreases at a slow rate as forecast time increases. In contrast, temporal variability increases during the first few days in the CMC control forecasts, and then levels off, consistent with a spinup of the forecasts starting from overly smoothed analyses. The diagnostics successfully reflect a reduction in the temporal variability of the CMC perturbed forecasts after a system upgrade. The diagnostics also illustrate a shift in variability maxima from storm-track regions for 1-day variability to blocking regions for 10-day variability. While these patterns are consistent with previous studies examining temporal variability on different time scales, they have the advantage of being obtainable without the need for extended (e.g., multimonth) forecast integrations.

## Abstract

In Part I of this study, a model of the distribution of true error variances given an ensemble variance is shown to be defined by six parameters that also determine the optimal weights for the static and flow-dependent parts of hybrid error variance models. Two of the six parameters (the climatological mean of forecast error variance and the climatological minimum of ensemble variance) are straightforward to estimate. The other four parameters are (i) the variance of the climatological distribution of the true conditional error variances, (ii) the climatological minimum of the true conditional error variance, (iii) the relative variance of the distribution of ensemble variances given a true conditional error variance, and (iv) the parameter that defines the mean response of the ensemble variances to changes in the true error variance. These parameters are *hidden* because they are defined in terms of *condition*-dependent forecast error variance, which is unobservable if the *condition* is not sufficiently repeatable. Here, a set of equations that enable these hidden parameters to be accurately estimated from a long time series of (observation minus forecast, ensemble variance) data pairs is presented. The accuracy of the equations is demonstrated in tests using data from long data assimilation cycles with differing model error variance parameters as well as synthetically generated data. This newfound ability to estimate these hidden parameters provides new tools for assessing the quality of ensemble forecasts, tuning hybrid error variance models, and postprocessing ensemble forecasts.

## Abstract

In Part I of this study, a model of the distribution of true error variances given an ensemble variance is shown to be defined by six parameters that also determine the optimal weights for the static and flow-dependent parts of hybrid error variance models. Two of the six parameters (the climatological mean of forecast error variance and the climatological minimum of ensemble variance) are straightforward to estimate. The other four parameters are (i) the variance of the climatological distribution of the true conditional error variances, (ii) the climatological minimum of the true conditional error variance, (iii) the relative variance of the distribution of ensemble variances given a true conditional error variance, and (iv) the parameter that defines the mean response of the ensemble variances to changes in the true error variance. These parameters are *hidden* because they are defined in terms of *condition*-dependent forecast error variance, which is unobservable if the *condition* is not sufficiently repeatable. Here, a set of equations that enable these hidden parameters to be accurately estimated from a long time series of (observation minus forecast, ensemble variance) data pairs is presented. The accuracy of the equations is demonstrated in tests using data from long data assimilation cycles with differing model error variance parameters as well as synthetically generated data. This newfound ability to estimate these hidden parameters provides new tools for assessing the quality of ensemble forecasts, tuning hybrid error variance models, and postprocessing ensemble forecasts.

## Abstract

Data assimilation schemes combine observational data with a short-term model forecast to produce an analysis. However, many characteristics of the atmospheric states described by the observations and the model differ. Observations often measure a higher-resolution state than coarse-resolution model grids can describe. Hence, the observations may measure aspects of gradients or unresolved eddies that are poorly resolved by the filtered version of reality represented by the model. This inconsistency, known as observation representation error, must be accounted for in data assimilation schemes. In this paper the ability of the ensemble to predict the variance of the observation error of representation is explored, arguing that the portion of representation error being detected by the ensemble variance is that portion correlated to the smoothed features that the coarse-resolution forecast model is able to predict. This predictive relationship is explored using differences between model states and their spectrally truncated form, as well as commonly used statistical methods to estimate observation error variances. It is demonstrated that the ensemble variance is a useful predictor of the observation error variance of representation and that it could be used to account for flow dependence in the observation error covariance matrix.

## Abstract

Data assimilation schemes combine observational data with a short-term model forecast to produce an analysis. However, many characteristics of the atmospheric states described by the observations and the model differ. Observations often measure a higher-resolution state than coarse-resolution model grids can describe. Hence, the observations may measure aspects of gradients or unresolved eddies that are poorly resolved by the filtered version of reality represented by the model. This inconsistency, known as observation representation error, must be accounted for in data assimilation schemes. In this paper the ability of the ensemble to predict the variance of the observation error of representation is explored, arguing that the portion of representation error being detected by the ensemble variance is that portion correlated to the smoothed features that the coarse-resolution forecast model is able to predict. This predictive relationship is explored using differences between model states and their spectrally truncated form, as well as commonly used statistical methods to estimate observation error variances. It is demonstrated that the ensemble variance is a useful predictor of the observation error variance of representation and that it could be used to account for flow dependence in the observation error covariance matrix.

## Abstract

A new multiscale, ensemble-based data assimilation (DA) method, multiscale local gain form ensemble transform Kalman filter (MLGETKF), is introduced. MLGETKF allows simultaneous update of multiple scales for both the ensemble mean and perturbations through assimilating all observations at once. MLGETKF performs DA in independent local volumes, which lends the algorithm a high degree of computational scalability. The multiscale analysis is enabled through the rapid creation of many pseudoensemble perturbations via a multiscale ensemble modulation procedure. The Kalman gain that is used to update the raw background ensemble mean and perturbations is based on this modulated ensemble, which intrinsically includes multiscale model space localization. Experiments with a noncycled statistical model show that the full background covariance estimated by MLGETKF more accurately resembles the shape of the true covariance than a scale-unaware localization. The mean analysis from the best-performing MLGETKF is statistically significantly more accurate than the best-performing scale-unaware LGETKF. The accuracy of the MLGETKF analysis is more sensitive to small-scale band localization radius than large-scale band. MLGETKF is further examined in a cycling DA context with a surface quasigeostrophic model. The root-mean-square potential temperature analysis error of the best-performing MLGETKF is 17.2% lower than that of the best-performing LGETKF. MLGETKF reduces analysis errors measured in kinetic energy spectra space by 30%–80% relative to LGETKF with the largest improvement at large scales. MLGETKF deterministic and ensemble mean forecasts are more accurate than LGETKF for full and large scales up to 5–6-day lead time and for small scales up to 3–4-day lead time, gaining ~12 h–1 day of predictability.

## Abstract

A new multiscale, ensemble-based data assimilation (DA) method, multiscale local gain form ensemble transform Kalman filter (MLGETKF), is introduced. MLGETKF allows simultaneous update of multiple scales for both the ensemble mean and perturbations through assimilating all observations at once. MLGETKF performs DA in independent local volumes, which lends the algorithm a high degree of computational scalability. The multiscale analysis is enabled through the rapid creation of many pseudoensemble perturbations via a multiscale ensemble modulation procedure. The Kalman gain that is used to update the raw background ensemble mean and perturbations is based on this modulated ensemble, which intrinsically includes multiscale model space localization. Experiments with a noncycled statistical model show that the full background covariance estimated by MLGETKF more accurately resembles the shape of the true covariance than a scale-unaware localization. The mean analysis from the best-performing MLGETKF is statistically significantly more accurate than the best-performing scale-unaware LGETKF. The accuracy of the MLGETKF analysis is more sensitive to small-scale band localization radius than large-scale band. MLGETKF is further examined in a cycling DA context with a surface quasigeostrophic model. The root-mean-square potential temperature analysis error of the best-performing MLGETKF is 17.2% lower than that of the best-performing LGETKF. MLGETKF reduces analysis errors measured in kinetic energy spectra space by 30%–80% relative to LGETKF with the largest improvement at large scales. MLGETKF deterministic and ensemble mean forecasts are more accurate than LGETKF for full and large scales up to 5–6-day lead time and for small scales up to 3–4-day lead time, gaining ~12 h–1 day of predictability.

## Abstract

Ensemble postprocessing is frequently applied to correct biases and deficiencies in the spread of ensemble forecasts. Methods involving weighted, regression-corrected forecasts address the typical biases and underdispersion of ensembles through a regression correction of ensemble members followed by the generation of a probability density function (PDF) from the weighted sum of kernels fit around each corrected member. The weighting step accounts for the situation where the ensemble is constructed from different model forecasts or generated in some way that creates ensemble members that do not represent equally likely states. In the present work, it is shown that an overweighting of climatology in weighted, regression-corrected forecasts can occur when one first performs a regression-based correction before weighting each member. This overweighting of climatology results in an increase in the mean-squared error of the mean of the predicted PDF. The overweighting of climatology is illustrated in a simulation study and a real-data study, where the reference is generated through a direct application of Bayes’s rule. The real-data example is a comparison of a particular method referred to as Bayesian model averaging (BMA) and a direct application of Bayes’s rule for ocean wave heights using U.S. Navy and National Weather Service global deterministic forecasts. This direct application of Bayes’s rule is shown to not overweight climatology and may be a low-cost replacement for the generally more expensive weighted, regression-correction methods.

## Abstract

Ensemble postprocessing is frequently applied to correct biases and deficiencies in the spread of ensemble forecasts. Methods involving weighted, regression-corrected forecasts address the typical biases and underdispersion of ensembles through a regression correction of ensemble members followed by the generation of a probability density function (PDF) from the weighted sum of kernels fit around each corrected member. The weighting step accounts for the situation where the ensemble is constructed from different model forecasts or generated in some way that creates ensemble members that do not represent equally likely states. In the present work, it is shown that an overweighting of climatology in weighted, regression-corrected forecasts can occur when one first performs a regression-based correction before weighting each member. This overweighting of climatology results in an increase in the mean-squared error of the mean of the predicted PDF. The overweighting of climatology is illustrated in a simulation study and a real-data study, where the reference is generated through a direct application of Bayes’s rule. The real-data example is a comparison of a particular method referred to as Bayesian model averaging (BMA) and a direct application of Bayes’s rule for ocean wave heights using U.S. Navy and National Weather Service global deterministic forecasts. This direct application of Bayes’s rule is shown to not overweight climatology and may be a low-cost replacement for the generally more expensive weighted, regression-correction methods.

## Abstract

Because of imperfections in ensemble data assimilation schemes, one cannot assume that the ensemble-derived covariance matrix is equal to the true error covariance matrix. Here, we describe a simple and intuitively compelling method to fit calibration functions of the ensemble sample variance to the mean of the distribution of true error variances, given an ensemble estimate. We demonstrate that the use of such calibration functions is consistent with theory showing that, when sampling error in the prior variance estimate is considered, the gain that minimizes the posterior error variance uses the expected true prior variance, given an ensemble sample variance. Once the calibration function has been fitted, it can be combined with ensemble-based and climatologically based error correlation information to obtain a generalized hybrid error covariance model. When the calibration function is chosen to be a linear function of the ensemble variance, the generalized hybrid error covariance model is the widely used linear hybrid consisting of a weighted sum of a climatological and an ensemble-based forecast error covariance matrix. However, when the calibration function is chosen to be, say, a cubic function of the ensemble sample variance, the generalized hybrid error covariance model is a nonlinear function of the ensemble estimate. We consider idealized univariate data assimilation and multivariate cycling ensemble data assimilation to demonstrate that the generalized hybrid error covariance model closely approximates the optimal weights found through computationally expensive tuning in the linear case and, in the nonlinear case, outperforms any plausible linear model.

## Abstract

Because of imperfections in ensemble data assimilation schemes, one cannot assume that the ensemble-derived covariance matrix is equal to the true error covariance matrix. Here, we describe a simple and intuitively compelling method to fit calibration functions of the ensemble sample variance to the mean of the distribution of true error variances, given an ensemble estimate. We demonstrate that the use of such calibration functions is consistent with theory showing that, when sampling error in the prior variance estimate is considered, the gain that minimizes the posterior error variance uses the expected true prior variance, given an ensemble sample variance. Once the calibration function has been fitted, it can be combined with ensemble-based and climatologically based error correlation information to obtain a generalized hybrid error covariance model. When the calibration function is chosen to be a linear function of the ensemble variance, the generalized hybrid error covariance model is the widely used linear hybrid consisting of a weighted sum of a climatological and an ensemble-based forecast error covariance matrix. However, when the calibration function is chosen to be, say, a cubic function of the ensemble sample variance, the generalized hybrid error covariance model is a nonlinear function of the ensemble estimate. We consider idealized univariate data assimilation and multivariate cycling ensemble data assimilation to demonstrate that the generalized hybrid error covariance model closely approximates the optimal weights found through computationally expensive tuning in the linear case and, in the nonlinear case, outperforms any plausible linear model.