## 1. Introduction

In data assimilation, the process of detecting and accounting for observation errors that are statistical outliers is called *quality control* (QC; e.g., Daley 1991). An operational numerical weather prediction system may employ multiple layers of QC. For instance, observations with implausible values are usually rejected even before they enter the data assimilation process. We refer to the algorithms used for such rejection decisions as *offline QC* algorithms. The fact that an observation passes the offline QC procedures does not guarantee that it is not a statistical outlier, however. For instance, an error in a highly accurate observation can be a statistical outlier when the error has a large representativeness error component. Such errors have to be dealt with by the data assimilation algorithm. We refer to the QC procedures that are part of the data assimilation algorithms as *online QC* algorithms.

Online QC algorithms detect observation errors that are statistical outliers by examining the difference between the observation and the prediction of the observation by the background. This difference is called the innovation. For instance, a simple online QC can be implemented by rejecting the observations for which the absolute value of the innovation is larger than a prescribed threshold. Another approach, which is more desirable from a theoretical point of view, is to employ *robust statistics* in the formulation of the state-update step of the data assimilation scheme (e.g., Huber 1981; Hampel 1968; Maronna et al. 2006). In particular, the presumed probability distribution of the observation errors can be modified such that the update step can anticipate errors that would be considered statistical outliers if the observation errors were strictly Gaussian. The practical challenge posed by this approach is to find a modification of the prescribed probability distribution function, which leads to a data assimilation algorithm that can be implemented in practice.

An operational online QC algorithm using robust observation error statistics (Anderson and Järvinen 1999) was first introduced by the European Centre for Medium-Range Weather Forecasts (ECMWF). The general idea of this approach was to define the probability distribution of the observation errors as the sum of two probability distributions: a normal distribution representing the “normal” observation errors and another distribution representing the “gross” observation errors. This approach was originally proposed as an offline QC procedure by Ingleby and Lorenc (1993), but the variational framework made its integration into the data assimilation scheme possible. The formulation of the algorithm by Anderson and Järvinen (1999) became known as variational QC (Var-QC). In the latest operational version of Var-QC, called the *Huber norm QC* (Tavolato and Isaksen 2010), the probability of medium and large observation errors decreases linearly making it faster than a Gaussian distribution but slower than a uniform distribution.

A wide variety of robust filtering schemes has been proposed in the mathematical statistics literature in the past decades. In particular, Meinhold and Singpurwalla (1989) replaced the normality assumption with fat-tailed distributions such as the *t* distribution, whereas Naveau et al. (2005) considered a skewed version of the normal distribution. West (1981, 1983, 1984) suggested a method for robust sequential approximate Bayesian estimation. Fahrmeir and Kaufmann (1991) and Fahrmeir and Kunstler (1999) offered posterior mode estimation and penalized likelihood smoothing in robust state-space models. Kassam and Poor (1985) discussed the minimax approach for the design of robust filters for signal processing. Schick and Mitter (1994) derived a first-order approximation for the conditional prior distribution of the state. Ershov and Liptser (1978), Stockinger and Dutter (1987), Martin and Raftery (1987), Birmiwal and Shen (1993), and Birmiwal and Papantoni-Kazakos (1994) also proposed robust filtering schemes that were resistant to outliers.

Recently, Ruckdeschel (2010) proposed a robust Kalman filter in the setting of time-discrete linear Euclidean state-space models with an extension to hidden Markov models, which is optimal in the sense of minimax mean-squared errors. He used the Huberization method but investigated its performance only on a one-dimensional linear system. Luo and Hoteit (2011) employed the *H*_{∞} filter to make ensemble Kalman filters (EnKF) robust enough to gross background errors. The *H*_{∞} filter minimizes the maximum of a cost function different from the minimum variance used in the Kalman filter. They demonstrated their approach on both a one-dimensional linear and a multidimensional nonlinear model. Calvet et al. (2012) introduced an impact function that quantified the sensitivity of the state distribution and proposed a filter with a bounded impact function.

EnKFs have been successfully implemented in highly complex operational prediction models in the atmospheric and oceanic sciences. They are Monte Carlo approximations of the traditional Kalman filter (KF; Kalman 1960) and use ensembles of forecasts to estimate the mean and covariance of the presumed normal distribution of the background. Similar to KF, EnKFs are not robust enough to gross errors in the estimate of the background mean or the observation (e.g., Schlee et al. 1967). The main goal of this paper is to design an EnKF scheme that is robust to observation errors that are statistical outliers. Harlim and Hunt (2007) and Luo and Hoteit (2011) made EnKF robust to unexpectedly large background errors. Here, we propose to make EnKF robust to gross observation errors by Huberization, a procedure that can be implemented on any EnKF scheme.

The rest of the paper is organized as follows. Section 2 first illustrates the effects of gross observation errors on the performance of EnKF; then, it describes our proposed approach to cope with such errors. Section 3 demonstrates the effectiveness of our approach for a one-dimensional linear system, while section 4 shows the results for the 40-variable Lorenz model. Finally, section 5 summarizes the main results of the paper.

## 2. A robust ensemble Kalman filter

### a. Ensemble Kalman filters

**x**

_{t}∈

^{n}be a finite-dimensional representation of the state of the atmosphere at time

*t*, and

**y**

_{t}∈

*t*is

_{t}∈

^{p×n}is the

*observation operator*and the random variable

*ε*_{t}∈

_{t}. The Kalman filter provides an estimate of the state

**x**

_{t}based on the observations taken at the past and the present observation times and on the assumed knowledge (model) of the dynamics.

*M*-member ensemble (sample),

*t*. This ensemble is called the

*background ensemble*. The mean of the background ensemble

*background*, is our best estimate of the state

**x**

_{t}before the assimilation of the observations taken at time

*t*. The analysis step of an EnKF generates an

*analysis ensemble*,

*analysis*, satisfies

*analysis error covariance matrix*

*Kalman gain matrix*

_{t}∈

^{n×p}is given by

*background error covariance matrix*

*t*is completed by the forecast step of the EnKF, in which the model dynamics are applied to each member of the analysis ensemble to obtain the members of the background ensemble for the next observation time,

*t*+ 1.

The components of the vector of differences *innovations* (each innovation describes the discrepancy between an observation and its predicted value). In addition, the components of the change in the state estimate **y**_{t}, are called *analysis increments*. The role of the Kalman gain matrix _{t} is to map the innovations into analysis increments. According to Eq. (6), the Kalman gain accounts for the observation errors based on the prescribed error statistics included in _{t}. It thus has no information about the errors in a particular observation or the magnitude of a particular innovation. Since the analysis increments are unbounded functions of the innovations, a large innovation due to a gross (outlier) observation error can cause a large degradation in the accuracy of the state estimate.

### b. The effects of observation outliers

^{p}is a vector of unknown outlying values. It is assumed that only a few components of

*α*< 1,

*k*

_{t}> 1, and

*N*

_{p}denotes the

*p*-variate Gaussian distribution. That is, the observation errors have a zero mean and a probability 1 −

*α*of coming from a normal distribution with covariance matrix

_{t}and a (usually small) probability

*α*of coming from a normal distribution with higher variances

*k*

_{t}×

_{t}. The value of

*k*

_{t}is assumed to be unknown. The additive outlier model corresponds to a situation where some of the observations are affected by a strong observation bias, whereas the innovations outlier model corresponds to a situation where there is an 100 ×

*α*percent chance that the observation error variance is larger than the prescribed value given by

_{t}.

*e*

_{t}and

*ε*

_{t}are zero-mean Gaussian processes with unit variance. The results shown in Fig. 1 were obtained by using the traditional EnKF algorithm to obtain the analysis ensemble that satisfies Eqs. (3) and (5) for Eqs. (9) and (10). We assimilate observations at every time step. The outliers occur at the times where the errors are marked by open circles. The top panel shows the results for the AO model, with

*ξ*

_{t}= 5 for the outliers, while the bottom panel shows the results for the IO model, with

*α*= 0.2 and

*k*

_{t}= 25 for the outliers. The accuracy of the state estimates are clearly degraded at the time steps where the outliers are present in either outlier model.

### c. A robust ensemble Kalman filter

The detrimental effect of the outliers on the EnKF state estimate can be reduced by decreasing the magnitude of those components of the innovation vector that have unusually large absolute values. This can be done by defining an upper bound for the allowable absolute value of the innovations. When the magnitude of an innovation is found to be larger than the prescribed upper bound, the magnitude of the innovation can be clipped at the upper bound. To be precise, the innovation *δy* is left unchanged if −*c* < *δy* < *c* for some *c* > 0 and clipped at −*c* if *δy* < −*c* and at *c* if *δy* > *c*. This componentwise clipping of the innovation is called *Huberization*, and the tunable parameter *c* is called the *clipping height*.

*Huberized analysis*

**u**

*Huber*function

*G*

_{c}(

**u**) is defined by (

*i*= 1, … ,

*p*)

*c*

_{i}and

*u*

_{i}are the

*i*th elements of

**c**and

**u**, respectively. The observation is clipped componentwisely by the clipping height of the same dimension. When Huberization achieves its goal of reducing the contamination of the prescribed distribution of the observation errors, the observation error covariance matrix

_{t}provides a better representation of the observation error covariance. Hence, we do not modify any entries of

_{t}.

A simple alternative to Huberization for handling observation error outliers is to discard the suspect observations from the data assimilation process. In fact, this is the online QC approach that has been employed by EnKF algorithms in weather prediction models (e.g., Szunyogh et al. 2008). In the simple numerical examples given here, we discard the observation if |*δy*| > *c* for a prescribed *c*. In these applications, the prescribed smallest magnitude of the innovation that triggers a rejection of the observation depends on the magnitude of the ensemble-based estimate of the background error variance at the observation location (the related entry of _{t}). Because this approach is based on discarding the observation rather than reducing the contamination from the observation error, the entries of _{t} that are related to the discarded observation must also be removed.

### d. Choosing parameter **c**

**c**

The tunable parameter of both strategies to handle the outlier observation errors, which were described in section 2c, is the *p*-dimensional vector **c** ∈ **c** would remove the contamination from the observation error or lead to the rejection of the observation without making any change in the state estimates of clean, outlier-free, observations. While such an ideal choice for **c** usually does not exist, we can define a measure of our tolerance for degradation in the accuracy of the state estimates for clean observations.

*relative efficiency*. The relative efficiency of two algorithms to estimate the state is defined by the ratio of the variance of the error in the two estimates they provide. The relative efficiency of EnKF with and without online QC,

*δ*∈ (0, 1]. Here, |·| denotes the Euclidean norm and the subscript id indicates that the norm is to be computed for clean, outlier-free, observations. If no quality control was applied (the components of

**c**were set to infinity), then the relative efficiency would be

*δ*= 1. Equivalently, achieving a perfect relative efficiency,

*δ*= 1, would require choosing

**c**= ∞. The lower the value of

*δ*we accept, the lower the values we can choose for the components of

**c**. A common choice for the relative efficiency is

*δ*= 0.95 (e.g., Huber 1981).

**c**, we have to find a practical approach to computing the components of

**c**for a given value of

*δ*. In Kalman filtering, the variance of the analysis error is usually estimated by the trace of

*G*

_{c}(

**u**) is a nonlinear function of the innovation. It cannot thus be absorbed into the Kalman gain matrix. Hence, after dropping the subscript

*t*that denotes the time, the only alternative left is to substitute

*i*th component

*c*

_{i}of the clipping height

**c**is obtained by using

_{i}is the

*i*th column of the Kalman gain matrix

**~**

*ε**N*

_{p}(

*c*

_{i}used to clip the

*i*th innovation is chosen as if the analysis process consisted of assimilating the

*i*th observation only. The selected clipping heights vary according to whether we clip the observation to

**c**by Huberizing or to

*c*

_{i}such that

*r*∈ (0, 1). Here, (

*x*)

_{+}= |

*x*| × max(

*x*/|

*x*|, 0). The radius

*r*is a proportion of the amount of clipping in the innovation. The clipping heights are the same for either type of clipping because this criterion does not depend on how we clip innovations. A smaller radius provides a larger clipping height and fewer clipping outliers. This radius criterion has been used to select a clipping height in the robust Kalman filter scheme (Ruckdeschel 2010).

The important issues in selecting the clipping height **c** are the computational complexity of the sample covariance matrices. First, a small ensemble size may produce inaccurate estimates of the covariance matrices (Whitaker and Hamill 2002). Another is that doing the Monte Carlo integration method to choose the clipping height **c** for all time steps is time consuming. To increase the accuracy of the covariance matrices and save computation time, we may use **c** to use at every time step in case we can obtain the limit, instead of using *t*. If we let _{∞} be the unknown *n* × *n* true covariance matrix at *t* = ∞, then we have *M* is sufficiently large, we can assume that

## 3. A one-dimensional linear system

We investigate the performance of the robust ensemble Kalman filter (REnKF) for this system using 20-member ensembles and a variance inflation factor of 1.1. A limit of the sample variance of the ensembles *c*. We use 500 replications for graphical representations with boxplots (Tukey 1970). The efficiencies *δ* = 0.99, 0.95, 0.9, 0.8, and 0.7, respectively, correspond to the clipping heights *c* = 4.25, 2.64, 2.19, 1.60, and 1.21 when we Huberize the observations. The same efficiencies correspond to the clipping heights *c* = 6.02, 4.80, 4.40, 3.71, and 3.21 when we discard the observations. The radii *r* = 0.0001, 0.001, 0.003, 0.005, and 0.01 respectively correspond to the clipping heights *c* = 5.20, 4.24, 3.77, 3.48, and 3.14 when we Huberize or discard the observations.

To see the impact of additive outliers, we suppose that the additive outliers with *ξ*_{t} = 8 are present in the data at *t* = 31, 32, and 33. Figure 2 shows the boxplots of the bias versus efficiency *δ*. Figure 3 shows the boxplots of the bias versus radius *r*. As the clipping value *c* decreases, that is, as the efficiency decreases or as the radius increases, the bias of the robust estimators shrinks, whereas the error variance decreases to a point but then increases again. The chosen clipping heights are in the range where the error variance keeps increasing. The bias of the Huberizing filter decreases to zero slower than that of the discarding filter, but the error variance increases slower than that of the discarding filter. The bias starts to recover from *t* = 34 when the outliers disappear. Figure 4 shows the trajectories of the true state, the traditional ensemble Kalman filter, and two robust ensemble Kalman filters with efficiencies *δ* = 0.99 and 0.7. Both robust ensemble Kalman filters have smaller jumps in the state estimation errors at the times of the outliers than the traditional ensemble Kalman filter has. At efficiency *δ* = 0.99, the discarding filter removes the jump entirely, coinciding with a bias of zero, but at efficiency *δ* = 0.7, its estimation is inaccurate, coinciding with the big error variance shown in Fig. 2. Figure 5 shows the trajectories of the true state, the traditional ensemble Kalman filter, and the two robust ensemble Kalman filters with efficiencies *δ* = 0.99 and 0.7 and radii *r* = 0.0001 and 0.01. For *r* = 0.01, the discarding filter is more precise than the Huberizing filter at *t* = 31, 32, and 33, but it is more imprecise in the absence of outliers from *t* = 10 to 20. It agrees that the larger the radius, the smaller bias and the larger error variance shown in Fig. 3.

To examine the effect of innovations outliers, we suppose that the innovation outliers with *k*_{t} = 25 occur at time *t* = 31, 32, and 33. Figure 6 shows the boxplots of the bias versus efficiency *δ*. Figure 7 shows the boxplots of the bias versus radius *r*. The bias stays at zero for all filters because the innovations outliers are set to have zero means. In terms of the error variance, the robust ensemble Kalman filters have increasing error variance as the efficiency *δ* decreases or as the radius *r* increases. At *t* = 31, 32, and 33, the traditional ensemble Kalman filter has the largest error variance. The efficiency *δ* gives a smaller error variance than the radius *r* gives because it has a larger clipping value compared to the radius and as such does not clip much. The Huberization is better than getting rid of observations in terms of the error variance. At times with no outliers, the robust ensemble Kalman filter, however, has a larger error variance than the traditional ensemble Kalman filter.

## 4. A multidimensional nonlinear system

### a. The Lorenz model

*dw*

_{i}. Then, the model equation is given by

*dw*

_{i}is a scalar from a Gaussian distribution with a zero mean and variance of 0.05,

*F*= 8, and the boundary conditions are assumed to be periodic. We use a fourth-order stochastic Runge–Kutta scheme with a time step of 0.05 nondimensional units to integrate the model. The background ensemble members are initialized from random fields and integrated for 500 steps. Each state variable is observed directly, and observations having uncorrelated errors are assimilated at every time step. The observation equation follows

*ε*_{t}is zero-mean white noise with variance

_{40}, and

_{40}is the identity matrix of size 40. The model is integrated for 190 time steps, and the first 100 time steps are discarded. The 20-member ensembles are used, and a localization constant of 15 and ensemble inflation factor of 1.07 are used following Whitaker and Hamill (2002). Experiments were conducted using the EnKF and REnKF with perturbed observations.

### b. Choice of the clipping height for the Lorenz model

We discuss how to choose the clipping height **c** and investigate the behavior of the robust ensemble Kalman filter for the Lorenz model. We use the average of the sample background covariance matrix *t* = 101 to 300 of *M* = 10 000 ensemble members to select a 40-dimensional clipping height vector **c** based on a Monte Carlo integration method. Figure 8 illustrates the 21st column of the averaged sample background covariance matrix. The sample background covariance matrices were computed by running the model forward and assimilating the observations.

Since the dynamics of the model, distribution of the observations, and observation error statistics are homogenous, all components of the clipping height vector **c** have similar values. The radii *r* = 0.0001, 0.0005, 0.001, 0.01, and 0.05 respectively correspond to the clipping heights 3.30, 2.89, 2.7, 2, and 1.44. The efficiencies *δ* = 0.9999, 0.999, 0.99, 0.985, and 0.98 respectively correspond to the clipping heights 2.45, 1.62, 0.55, 0.32, and 0.16 when we Huberize observations, and they respectively correspond to the clipping heights 3.8, 3, 1.8, 1.43, and 1.06 when we discard observations. We use 200 replications for graphical representations with boxplots.

### c. The effects of outliers

*ξ*

_{t}= 10 occur for neighboring variables 11, 12, and 13 at

*t*= 71, 72, and 73. Figure 9 shows the boxplots of the bias versus efficiency in the presence of additive outliers with

*ξ*

_{t}= 10 for the Lorenz model. Figure 10 shows the boxplots of the bias versus radius in the presence of additive outliers with

*ξ*

_{t}= 10. As the clipping value decreases, that is, as the radius increases or as the efficiency decreases, the bias for the robust filters gets closer to zero, similar to the behavior observed in the one-dimensional linear system. The discarding filter forces the bias to go to zero faster than the Huberization filter. The error variance decreases to a point but then it increases again as the clipping value

**c**decreases. The explanation for this behavior is that a proper clipping height truncates observations safely, but a too small clipping height clips observations too much: in the expression

*G*

_{c}cuts, in addition to

*ξ*_{t}, a significant portion of

*t*= 71 to 73 because the outliers are carried over and not perfectly removed. The bias starts to recover from

*t*= 74 when outliers do not occur. Figure 11 shows the time evolution of a component of the state vector for the true state, traditional ensemble Kalman filter, and two robust ensemble Kalman filters with efficiencies

*δ*= 0.9999 and 0.98. Coinciding with the large error variance in the boxplots, the estimation with efficiency 0.98 is imprecise because the maximum relative efficiency that the robust ensemble Kalman filter can achieve for each observation is 0.9747. Figure 12 shows the time evolution of a component of the state vector for the true state, traditional ensemble Kalman filter, and two robust ensemble Kalman filters with radii

*r*= 0.0001 and 0.05. At the same radius, 0.05, the Huberization filter is more accurate than the discarding filter.

To investigate the effect of innovations outliers, we assume that the observation error comes from white noise with extreme variance at variables 11, 12, and 13 at *t* = 71, 72, and 73. Figure 13 shows the boxplots of the bias versus efficiency in the presence of innovations outliers with *k*_{t} = 100 in the Lorenz model. Figure 14 shows the boxplots of the bias versus radius in the presence of the same innovations outliers. For *r* > 0 and *δ* < 1, the bias stays at zero but the error variance decreases to a certain point and then increases again as the clipping height decreases, and at *t* = 70 when no outliers occur, both robust ensemble Kalman filters experience a loss of accuracy.

## 5. Discussion

We proposed a robust ensemble Kalman filter for the robust estimation of the state of a spatiotemporal dynamical system in the presence of observational outliers. We applied this robust ensemble Kalman filter to a one-dimensional linear system and a multidimensional nonlinear system. Using this filtering technique, which is based on the Huberization method, the negative effects of the outliers on the state estimates can be greatly reduced. The clipping values were selected using the efficiency and radius criteria. We compared the results of the robust ensemble Kalman filter with those from the classical ensemble Kalman filter. We also compared the robust ensemble Kalman filter based on the Huberization filter, which pulls the outliers back to **c** or −**c**, and the robust ensemble Kalman filter, which discards outliers. We found that compared to the conventional EnKF, the robust ensemble Kalman filter reduced the bias in the state estimates at the expense of increasing the error variance. The increase of the error variance differed depending on the filtering method. The Huberization filter was found to perform better than the discarding filter for the examples given in the paper, which may be because the model we used gives the true state. The robust ensemble Kalman filter is efficient with simple models, and we plan to test it in realistic ocean and atmospheric systems.

Finding the proper clipping values for a data assimilation system that assimilates many types of observations using a complex model is expected to be a labor intensive process. There is no reason to believe, however, that the process would be more challenging or would require more work than determining the parameters of the quality-control procedures currently used in operational numerical weather prediction. In fact, the parameters used in the current operational systems should provide invaluable information about the gross errors in the different types of observations, which could be used as guidance for the selection of the clipping values.

## Acknowledgments

This work was supported in part by Award KUS-C1-016-04 made by King Abdullah University of Science and Technology (KAUST). Mikyoung Jun's research was also partially supported by NSF Grants DMS-0906532 and DMS-1208421. Istvan Szunyogh acknowledges the support from ONR Grant N000141210785.

## REFERENCES

Anderson, E., and H. Järvinen, 1999: Variational quality control.

,*Quart. J. Roy. Meteor. Soc.***125**, 697–722.Birmiwal, K., and J. Shen, 1993: Optimal robust filtering.

,*Stat. Decis.***11**, 101–119.Birmiwal, K., and P. Papantoni-Kazakos, 1994: Outlier resistant prediction for stationary processes.

,*Stat. Decis.***12**, 395–427.Burgers, G., P. J. van Leeuwen, and G. Evensen, 1998: Analysis scheme in the ensemble Kalman filter.

,*Mon. Wea. Rev.***126**, 1719–1724.Calvet, L. E., V. Czellar, and E. Ronchetti, cited 2012: Robust filtering. [Available online at http://ssrn.com/abstract=2123477.]

Daley, R., 1991:

*Atmospheric Data Analysis.*Cambridge Atmospheric and Space Science Series, Cambridge University Press, 457 pp.Ershov, A. A., and R. S. Liptser, 1978: Robust Kalman filter in discrete time.

,*IEEE Trans. Autom. Remote Control***39**, 359–367.Fahrmeir, L., and H. Kaufmann, 1991: On Kalman filtering, posterior mode estimation and Fisher scoring in dynamic exponential family regression.

,*Metrika***38**, 37–60.Fahrmeir, L., and R. Kunstler, 1999: Penalized likelihood smoothing in robust state space models.

,*Metrika***49**, 173–191.Fox, A. J., 1972: Outliers in time series.

,*J. Roy. Stat. Soc.***B34**, 350–363.Genton, M. G., 2003: Breakdown-point for spatially and temporally correlated observations.

*Developments in Robust Statistics,*R. Dutter et al., Eds., Springer, 148–159.Genton, M. G., and A. Lucas, 2003: Comprehensive definitions of breakdown-points for independent and dependent observations.

,*J. Roy. Stat. Soc.***B65**, 81–94.Genton, M. G., and A. Lucas, 2005: Discussion of “Breakdown and groups” by L. Davies and U. Gather.

,*Ann. Stat.***33**, 988–993.Hampel, F. R., 1968: Contributions to the theory of robust estimation. Ph.D. thesis, University of California.

Harlim, J., and B. R. Hunt, 2007: A non-Gaussian ensemble filter for assimilating infrequent noisy observations.

,*Tellus***59A**, 225–237.Houtekamer, P. L., and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique.

,*Mon. Wea. Rev.***126**, 796–811.Huber, P. J., 1981:

*Robust Statistics.*Wiley, 308 pp.Ingleby, N. B., and A. C. Lorenc, 1993: Bayesian quality control using multivariate normal distributions.

,*Quart. J. Roy. Meteor. Soc.***119**, 1195–1225.Kalman, R. E., 1960: A new approach to linear filtering and prediction problems.

,*J. Basic Eng.***82**, 34–45.Kassam, S. A., and H. V. Poor, 1985: Robust techniques for signal processing: A survey.

,*Proc. IEEE***73**, 433–481.Lorenz, E. N., and K. A. Emanuel, 1998: Optimal sites for supplementary weather observations: Simulation with a small model.

,*J. Atmos. Sci.***55**, 399–414.Luo, X., and I. Hoteit, 2011: Robust ensemble filtering and its relation to covariance inflation in the ensemble Kalman filter.

,*Mon. Wea. Rev.***139**, 3938–3953.Maronna, A., R. D. Martin, and V. J. Yohai, 2006:

*Robust Statistics: Theory and Methods.*Wiley, 436 pp.Martin, R. D., and A. E. Raftery, 1987: Robustness, computation and non-Euclidean models.

,*J. Amer. Stat. Assoc.***82**, 1044–1050.Meinhold, R. J., and N. D. Singpurwalla, 1983: Understanding the Kalman filter.

,*Amer. Stat.***37**, 123–127.Meinhold, R. J., and N. D. Singpurwalla, 1989: Robustification of Kalman filter models.

,*J. Amer. Stat. Assoc.***84**, 479–486.Naveau, P., M. G. Genton, and X. Shen, 2005: A skewed Kalman filter.

,*J. Multivariate Anal.***95**, 382–400.Ruckdeschel, P., 2010: Optimally robust Kalman filtering. Berichte des Fraunhofer ITWM 185, 53 pp.

Schick, I. C., and S. K. Mitter, 1994: Robust recursive estimation in the presence of heavy-tailed observation noise.

,*Ann. Stat.***22**, 1045–1080.Schlee, F. H., C. J. Standish, and N. F. Toda, 1967: Divergence in the Kalman filter.

,*Amer. Inst. Aeronaut. Astronaut. J.***5**, 1114–1120.Stockinger, N., and R. Dutter, 1987: Robust time series analysis: A survey.

,*Kybernetika***23**, 3–88.Szunyogh, I., E. J. Kostelich, G. Gyarmati, E. Kalnay, B. R. Hunt, E. Ott, E. Satterfield, and J. A. Yorke, 2008: A local ensemble transform Kalman filter data assimilation system for the NCEP global model.

,*Tellus***60**, 113–130.Tavolato, C., and L. Isaksen, 2010: Huber norm quality control in the IFS.

*ECMWF Newsletter,*No. 122, ECMWF, Reading, United Kingdom, 27–31.Tukey, J. W., 1970:

*Exploratory Data Analysis.*Vol. 1. Addison-Wesley, 688 pp.West, M., 1981: Robust sequential approximate Bayesian estimation.

,*J. Roy. Stat. Soc.***B43**, 157–166.West, M., 1983: Generalized linear models: Scale parameters, outlier accommodation and prior distributions.

,*Bayesian Stat.***2**, 531–558.West, M., 1984: Outlier models and prior distributions in Bayesian linear regression.

,*J. Roy. Stat. Soc.***B46**, 431–439.Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations.

,*Mon. Wea. Rev.***130**, 1913–1924.