Browse
Abstract
A variety of ad hoc procedures have been developed to prevent filter divergence in ensemble-based data assimilation schemes. These procedures are necessary to reduce the impacts of sampling errors in the background error covariance matrix derived from a limited-size ensemble. The procedures amount to the introduction of additional noise into the assimilation process, possibly reducing the accuracy of the resulting analyses. The effects of this noise on analysis and forecast performance are investigated in a perfect model scenario. Alternative schemes aimed at controlling the unintended injection of noise are proposed and compared. Improved analysis and forecast accuracy is observed in schemes with minimal alteration to the evolving ensemble-based covariance structure.
Abstract
A variety of ad hoc procedures have been developed to prevent filter divergence in ensemble-based data assimilation schemes. These procedures are necessary to reduce the impacts of sampling errors in the background error covariance matrix derived from a limited-size ensemble. The procedures amount to the introduction of additional noise into the assimilation process, possibly reducing the accuracy of the resulting analyses. The effects of this noise on analysis and forecast performance are investigated in a perfect model scenario. Alternative schemes aimed at controlling the unintended injection of noise are proposed and compared. Improved analysis and forecast accuracy is observed in schemes with minimal alteration to the evolving ensemble-based covariance structure.
Abstract
This study explores the functional relationship between model physics parameters and model output variables for the purpose of 1) characterizing the sensitivity of the simulation output to the model formulation and 2) understanding model uncertainty so that it can be properly accounted for in a data assimilation framework. A Markov chain Monte Carlo algorithm is employed to examine how changes in cloud microphysical parameters map to changes in output precipitation, liquid and ice water path, and radiative fluxes for an idealized deep convective squall line. Exploration of the joint probability density function (PDF) of parameters and model output state variables reveals a complex relationship between parameters and model output that changes dramatically as the system transitions from convective to stratiform. Persistent nonuniqueness in the parameter–state relationships is shown to be inherent in the construction of the cloud microphysical and radiation schemes and cannot be mitigated by reducing observation uncertainty. The results reinforce the importance of including uncertainty in model configuration in ensemble prediction and data assimilation, and they indicate that data assimilation efforts that include parameter estimation would benefit from including additional constraints based on known physical relationships between model physics parameters to render a unique solution.
Abstract
This study explores the functional relationship between model physics parameters and model output variables for the purpose of 1) characterizing the sensitivity of the simulation output to the model formulation and 2) understanding model uncertainty so that it can be properly accounted for in a data assimilation framework. A Markov chain Monte Carlo algorithm is employed to examine how changes in cloud microphysical parameters map to changes in output precipitation, liquid and ice water path, and radiative fluxes for an idealized deep convective squall line. Exploration of the joint probability density function (PDF) of parameters and model output state variables reveals a complex relationship between parameters and model output that changes dramatically as the system transitions from convective to stratiform. Persistent nonuniqueness in the parameter–state relationships is shown to be inherent in the construction of the cloud microphysical and radiation schemes and cannot be mitigated by reducing observation uncertainty. The results reinforce the importance of including uncertainty in model configuration in ensemble prediction and data assimilation, and they indicate that data assimilation efforts that include parameter estimation would benefit from including additional constraints based on known physical relationships between model physics parameters to render a unique solution.
Abstract
This study uses the local ensemble transform Kalman filter to assimilate Atmospheric Infrared Sounder (AIRS) specific humidity retrievals with pseudo relative humidity (pseudo-RH) as the observation variable. Three approaches are tested: (i) updating specific humidity with observations other than specific humidity (“passive q”), (ii) updating specific humidity only with humidity observations (“univariate q”), and (iii) assimilating the humidity and the other observations together (“multivariate q”). This is the first time that the performance of the univariate and multivariate assimilation of q is compared within an ensemble Kalman filter framework. The results show that updating the humidity analyses by either AIRS specific humidity retrievals or nonhumidity observations improves both the humidity and wind analyses. The improvement with the multivariate-q experiment is by far the largest for all dynamical variables at both analysis and forecast time, indicating that the interaction between the specific humidity and the other dynamical variables through the background error covariance during data assimilation process yields more balanced analysis fields. In the univariate assimilation of q, the humidity interacts with the other dynamical variables only through the forecast process. The univariate assimilation produces more accurate humidity analyses than those obtained when no humidity observations are assimilated, but it does not improve the accuracy of the zonal wind analyses. The 6-h total column precipitable water forecast also benefits from the improved humidity analyses, with the multivariate q experiment having the largest improvement.
Abstract
This study uses the local ensemble transform Kalman filter to assimilate Atmospheric Infrared Sounder (AIRS) specific humidity retrievals with pseudo relative humidity (pseudo-RH) as the observation variable. Three approaches are tested: (i) updating specific humidity with observations other than specific humidity (“passive q”), (ii) updating specific humidity only with humidity observations (“univariate q”), and (iii) assimilating the humidity and the other observations together (“multivariate q”). This is the first time that the performance of the univariate and multivariate assimilation of q is compared within an ensemble Kalman filter framework. The results show that updating the humidity analyses by either AIRS specific humidity retrievals or nonhumidity observations improves both the humidity and wind analyses. The improvement with the multivariate-q experiment is by far the largest for all dynamical variables at both analysis and forecast time, indicating that the interaction between the specific humidity and the other dynamical variables through the background error covariance during data assimilation process yields more balanced analysis fields. In the univariate assimilation of q, the humidity interacts with the other dynamical variables only through the forecast process. The univariate assimilation produces more accurate humidity analyses than those obtained when no humidity observations are assimilated, but it does not improve the accuracy of the zonal wind analyses. The 6-h total column precipitable water forecast also benefits from the improved humidity analyses, with the multivariate q experiment having the largest improvement.
Abstract
This study addresses the issue of model errors with the ensemble Kalman filter. Observations generated from the NCEP–NCAR reanalysis fields are assimilated into a low-resolution AGCM. Without an effort to account for model errors, the performance of the local ensemble transform Kalman filter (LETKF) is seriously degraded when compared with the perfect-model scenario. Several methods to account for model errors, including model bias and system noise, are investigated. The results suggest that the two pure bias removal methods considered [Dee and Da Silva (DdSM) and low dimensional (LDM)] are not able to beat the multiplicative or additive inflation schemes used to account for the effects of total model errors. In contrast, when the bias removal methods are augmented by additive noise representing random errors (DdSM+ and LDM+), they outperform the pure inflation schemes. Of these augmented methods, the LDM+, where the constant bias, diurnal bias, and state-dependent errors are estimated from a large sample of 6-h forecast errors, gives the best results. The advantage of the LDM+ over other methods is larger in data-sparse regions than in data-dense regions.
Abstract
This study addresses the issue of model errors with the ensemble Kalman filter. Observations generated from the NCEP–NCAR reanalysis fields are assimilated into a low-resolution AGCM. Without an effort to account for model errors, the performance of the local ensemble transform Kalman filter (LETKF) is seriously degraded when compared with the perfect-model scenario. Several methods to account for model errors, including model bias and system noise, are investigated. The results suggest that the two pure bias removal methods considered [Dee and Da Silva (DdSM) and low dimensional (LDM)] are not able to beat the multiplicative or additive inflation schemes used to account for the effects of total model errors. In contrast, when the bias removal methods are augmented by additive noise representing random errors (DdSM+ and LDM+), they outperform the pure inflation schemes. Of these augmented methods, the LDM+, where the constant bias, diurnal bias, and state-dependent errors are estimated from a large sample of 6-h forecast errors, gives the best results. The advantage of the LDM+ over other methods is larger in data-sparse regions than in data-dense regions.
Abstract
Real-size assimilation systems basically rely on estimation theory in which two sources of information, background and observations, are combined in an optimal way. However, the optimality of such large problems is not guaranteed, since they rely on different approximations. The observed values of the subparts of the cost function measuring the distance between the analysis and the two sources of information can be compared to their theoretical statistical expectations. Such a posteriori diagnostics can be used to optimize the statistics of observation and also background errors. Moreover, the expectations of the subparts of the cost function associated with observations are related to the weights of the different sources of observations in the analysis. It is shown that those theoretical statistical expectations are direct by-products of an ensemble of perturbed assimilations.
Abstract
Real-size assimilation systems basically rely on estimation theory in which two sources of information, background and observations, are combined in an optimal way. However, the optimality of such large problems is not guaranteed, since they rely on different approximations. The observed values of the subparts of the cost function measuring the distance between the analysis and the two sources of information can be compared to their theoretical statistical expectations. Such a posteriori diagnostics can be used to optimize the statistics of observation and also background errors. Moreover, the expectations of the subparts of the cost function associated with observations are related to the weights of the different sources of observations in the analysis. It is shown that those theoretical statistical expectations are direct by-products of an ensemble of perturbed assimilations.
Abstract
A source inversion technique for chemical constituents is presented that uses assimilated constituent observations rather than directly using the observations. The method is tested with a simple model problem, which is a two-dimensional Fourier–Galerkin transport model combined with a Kalman filter for data assimilation. Inversion is carried out using a Green’s function method and observations are simulated from a true state with added Gaussian noise. The forecast state uses the same spectral model but differs by an unbiased Gaussian model error and emissions models with constant errors. The numerical experiments employ both simulated in situ and satellite observation networks. Source inversion was carried out either by directly using synthetically generated observations with added noise or by first assimilating the observations and using the analyses to extract observations. Twenty identical twin experiments were conducted for each set of source and observation configurations, and it was found that in the limiting cases of a very few localized observations or an extremely large observation network there is little advantage to carrying out assimilation first. For intermediate observation densities, the source inversion error standard deviation is decreased by 50% to 90% when the observations are assimilated with the Kalman filter before carrying out the Green’s function inversion.
Abstract
A source inversion technique for chemical constituents is presented that uses assimilated constituent observations rather than directly using the observations. The method is tested with a simple model problem, which is a two-dimensional Fourier–Galerkin transport model combined with a Kalman filter for data assimilation. Inversion is carried out using a Green’s function method and observations are simulated from a true state with added Gaussian noise. The forecast state uses the same spectral model but differs by an unbiased Gaussian model error and emissions models with constant errors. The numerical experiments employ both simulated in situ and satellite observation networks. Source inversion was carried out either by directly using synthetically generated observations with added noise or by first assimilating the observations and using the analyses to extract observations. Twenty identical twin experiments were conducted for each set of source and observation configurations, and it was found that in the limiting cases of a very few localized observations or an extremely large observation network there is little advantage to carrying out assimilation first. For intermediate observation densities, the source inversion error standard deviation is decreased by 50% to 90% when the observations are assimilated with the Kalman filter before carrying out the Green’s function inversion.
Abstract
This article discusses several models for background error correlation matrices using the wavelet diagonal assumption and the diffusion operator. The most general properties of filtering local correlation functions, with wavelet formulations, are recalled. Two spherical wavelet transforms based on Legendre spectrum and a gridpoint spherical wavelet transform are compared. The latter belongs to the class of second-generation wavelets. In addition, a nonseparable formulation that merges the wavelets and the diffusion operator model is formally proposed. This hybrid formulation is illustrated in a simple two-dimensional framework. These three formulations are tested in a toy experiment on the sphere: a large ensemble of perturbed forecasts is used to simulate a true background error ensemble, which gives a reference. This ensemble is then applied to compute the required parameters for each model. A randomization method is utilized in order to diagnose these different models. In particular, their ability to represent the geographical variations of the local correlation functions is studied by diagnosis of the local length scale. The results from these experiments show that the spectrally based wavelet formulation filters the geographical variations of the local correlation length scale but it is less able to represent the anisotropy. The gridpoint-based wavelet formulation is also able to represent some parts of the geographical variations but it appears that the correlation functions are dependent on the grid. Finally, the formulation based on the diffusion represents quite well the local length scale.
Abstract
This article discusses several models for background error correlation matrices using the wavelet diagonal assumption and the diffusion operator. The most general properties of filtering local correlation functions, with wavelet formulations, are recalled. Two spherical wavelet transforms based on Legendre spectrum and a gridpoint spherical wavelet transform are compared. The latter belongs to the class of second-generation wavelets. In addition, a nonseparable formulation that merges the wavelets and the diffusion operator model is formally proposed. This hybrid formulation is illustrated in a simple two-dimensional framework. These three formulations are tested in a toy experiment on the sphere: a large ensemble of perturbed forecasts is used to simulate a true background error ensemble, which gives a reference. This ensemble is then applied to compute the required parameters for each model. A randomization method is utilized in order to diagnose these different models. In particular, their ability to represent the geographical variations of the local correlation functions is studied by diagnosis of the local length scale. The results from these experiments show that the spectrally based wavelet formulation filters the geographical variations of the local correlation length scale but it is less able to represent the anisotropy. The gridpoint-based wavelet formulation is also able to represent some parts of the geographical variations but it appears that the correlation functions are dependent on the grid. Finally, the formulation based on the diffusion represents quite well the local length scale.
Abstract
Model error is the component of the forecast error that is due to the difference between the dynamics of the atmosphere and the dynamics of the numerical prediction model. The systematic, slowly varying part of the model error is called model bias. This paper evaluates three different ensemble-based strategies to account for the surface pressure model bias in the analysis scheme. These strategies are based on modifying the observation operator for the surface pressure observations by the addition of a bias-correction term. One estimates the correction term adaptively, while another uses the hydrostatic balance equation to obtain the correction term. The third strategy combines an adaptively estimated correction term and the hydrostatic-balance-based correction term. Numerical experiments are carried out in an idealized setting, where the National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS) model is integrated at resolution T62L28 to simulate the evolution of the atmosphere and the T30L7 resolution Simplified Parameterization Primitive Equation Dynamics (SPEEDY) model is used for data assimilation. The results suggest that the adaptive bias-correction term is effective in correcting the bias in the data-rich regions, while the hydrostatic-balance-based approach is effective in data-sparse regions. The adaptive bias-correction approach also has the benefit that it leads to a significant improvement of the temperature and wind analysis at the higher model levels. The best results are obtained when the two bias-correction approaches are combined.
Abstract
Model error is the component of the forecast error that is due to the difference between the dynamics of the atmosphere and the dynamics of the numerical prediction model. The systematic, slowly varying part of the model error is called model bias. This paper evaluates three different ensemble-based strategies to account for the surface pressure model bias in the analysis scheme. These strategies are based on modifying the observation operator for the surface pressure observations by the addition of a bias-correction term. One estimates the correction term adaptively, while another uses the hydrostatic balance equation to obtain the correction term. The third strategy combines an adaptively estimated correction term and the hydrostatic-balance-based correction term. Numerical experiments are carried out in an idealized setting, where the National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS) model is integrated at resolution T62L28 to simulate the evolution of the atmosphere and the T30L7 resolution Simplified Parameterization Primitive Equation Dynamics (SPEEDY) model is used for data assimilation. The results suggest that the adaptive bias-correction term is effective in correcting the bias in the data-rich regions, while the hydrostatic-balance-based approach is effective in data-sparse regions. The adaptive bias-correction approach also has the benefit that it leads to a significant improvement of the temperature and wind analysis at the higher model levels. The best results are obtained when the two bias-correction approaches are combined.
Abstract
In geophysical data assimilation, observations shed light on a control parameter space through a model, a statistical prior, and an optimal combination of these sources of information. This control space can be a set of discrete parameters, or, more often in geophysics, part of the state space, which is distributed in space and time. When the control space is continuous, it must be discretized for numerical modeling. This discretization, in this paper called a representation of this distributed parameter space, is always fixed a priori.
In this paper, the representation of the control space is considered a degree of freedom on its own. The goal of the paper is to demonstrate that one could optimize it to perform data assimilation in optimal conditions. The optimal representation is then chosen over a large dictionary of adaptive grid representations involving several space and time scales.
First, to motivate the importance of the representation choice, this paper discusses the impact of a change of representation on the posterior analysis of data assimilation and its connection to the reduction of uncertainty. It is stressed that in some circumstances (atmospheric chemistry, in particular) the choice of a proper representation of the control space is essential to set the data assimilation statistical framework properly.
A possible mathematical framework is then proposed for multiscale data assimilation. To keep the developments simple, a measure of the reduction of uncertainty is chosen as a very simple optimality criterion. Using this criterion, a cost function is built to select the optimal representation. It is a function of the control space representation itself. A regularization of this cost function, based on a statistical mechanical analogy, guarantees the existence of a solution.
This allows numerical optimization to be performed on the representation of control space. The formalism is then applied to the inverse modeling of an accidental release of an atmospheric contaminant at European scale, using real data.
Abstract
In geophysical data assimilation, observations shed light on a control parameter space through a model, a statistical prior, and an optimal combination of these sources of information. This control space can be a set of discrete parameters, or, more often in geophysics, part of the state space, which is distributed in space and time. When the control space is continuous, it must be discretized for numerical modeling. This discretization, in this paper called a representation of this distributed parameter space, is always fixed a priori.
In this paper, the representation of the control space is considered a degree of freedom on its own. The goal of the paper is to demonstrate that one could optimize it to perform data assimilation in optimal conditions. The optimal representation is then chosen over a large dictionary of adaptive grid representations involving several space and time scales.
First, to motivate the importance of the representation choice, this paper discusses the impact of a change of representation on the posterior analysis of data assimilation and its connection to the reduction of uncertainty. It is stressed that in some circumstances (atmospheric chemistry, in particular) the choice of a proper representation of the control space is essential to set the data assimilation statistical framework properly.
A possible mathematical framework is then proposed for multiscale data assimilation. To keep the developments simple, a measure of the reduction of uncertainty is chosen as a very simple optimality criterion. Using this criterion, a cost function is built to select the optimal representation. It is a function of the control space representation itself. A regularization of this cost function, based on a statistical mechanical analogy, guarantees the existence of a solution.
This allows numerical optimization to be performed on the representation of control space. The formalism is then applied to the inverse modeling of an accidental release of an atmospheric contaminant at European scale, using real data.
Abstract
Particle filters are ensemble-based assimilation schemes that, unlike the ensemble Kalman filter, employ a fully nonlinear and non-Gaussian analysis step to compute the probability distribution function (pdf) of a system’s state conditioned on a set of observations. Evidence is provided that the ensemble size required for a successful particle filter scales exponentially with the problem size. For the simple example in which each component of the state vector is independent, Gaussian, and of unit variance and the observations are of each state component separately with independent, Gaussian errors, simulations indicate that the required ensemble size scales exponentially with the state dimension. In this example, the particle filter requires at least 1011 members when applied to a 200-dimensional state. Asymptotic results, following the work of Bengtsson, Bickel, and collaborators, are provided for two cases: one in which each prior state component is independent and identically distributed, and one in which both the prior pdf and the observation errors are Gaussian. The asymptotic theory reveals that, in both cases, the required ensemble size scales exponentially with the variance of the observation log likelihood rather than with the state dimension per se.
Abstract
Particle filters are ensemble-based assimilation schemes that, unlike the ensemble Kalman filter, employ a fully nonlinear and non-Gaussian analysis step to compute the probability distribution function (pdf) of a system’s state conditioned on a set of observations. Evidence is provided that the ensemble size required for a successful particle filter scales exponentially with the problem size. For the simple example in which each component of the state vector is independent, Gaussian, and of unit variance and the observations are of each state component separately with independent, Gaussian errors, simulations indicate that the required ensemble size scales exponentially with the state dimension. In this example, the particle filter requires at least 1011 members when applied to a 200-dimensional state. Asymptotic results, following the work of Bengtsson, Bickel, and collaborators, are provided for two cases: one in which each prior state component is independent and identically distributed, and one in which both the prior pdf and the observation errors are Gaussian. The asymptotic theory reveals that, in both cases, the required ensemble size scales exponentially with the variance of the observation log likelihood rather than with the state dimension per se.