## 1. Introduction

Data assimilation (DA) refers to the estimation of oceanic–atmospheric fields by melding sensor data with a model of the dynamics under study. Most DA schemes are rooted in statistical estimation theory: the state of a system is estimated by combining all knowledge of the system, like measurements and theoretical laws or empirical principles, in accord with their respective statistical uncertainty. The present challenge is that the state of the atmosphere and ocean system is complex, evolving on multiple time and space scales (Charnock 1981;Charney and Flierl 1981). Direct observations can be difficult and costly to acquire on a sustained basis, especially in oceanography. The large breadth of scales and variables also leads to costly and challenging numerical simulations. Future advances in coupled ocean–atmosphere estimation will thus require efficient assimilation schemes. In Part I of this two-part paper, the main goal is to develop the basis of a comprehensive, portable, and versatile four-dimensional DA scheme for the estimation and simulation of realistic geophysical fields. The adjective *realistic* emphasizes that the scheme should capture the time and space scales of the real processes of interest. It implies the use of real ocean data, as well as appropriate theoretical models and numerical resources. The primary focus is on the physics;acoustical, biological, and chemical phenomena will be investigated later. The implementation presented is compatible with the Harvard Ocean Prediction System (HOPS; e.g., Lozano et al. 1996; Robinson 1996b) and the future of this work involves ocean–atmospheric data-driven estimations. In Part II (Lermusiaux 1999a) of this paper, identical twin experiments based on Middle Atlantic Bight shelfbreak front simulations are employed to assess and exemplify the capabilities of the present DA scheme.

A description of the goals and uses of DA, with a review of most methods, is given in Robinson et al. (1998a, and references therein). The issue is that, with most existing approaches, the combination of our practical, accuracy, and realism goals is difficult to satisfy (sections 3, 4). In fact, several directions have been taken so as to determine feasible schemes for realistic studies. Examples of such attempts include simpler physics models to integrate errors (e.g., Dee et al. 1985;Dee 1990; Daley 1992b), variance-only error models (e.g., Daley 1991, 1992b), steady-state error models (e.g., Fukumori et al. 1993), and reduced dimension or coarse-grid Kalman filters (KF; e.g., Fukumori and Malanotte-Rizzoli 1995). Other reductions deal with the explicit computation of non-null elements of linearized transfer matrices (e.g., Parrish and Cohn 1985; Jiang and Ghil 1993), banded approximations (e.g., Parrish and Cohn 1985), extended filters (e.g., Evensen 1993), ensemble and Monte Carlo methods (e.g., Evensen 1994a,b; Miller et al. 1994), and possibly using the optimal and breed perturbations for assimilation (Ehrendorfer and Errico 1995; Toth and Kalnay 1993). It is important to realize that several of these attempts are based on incompatible hypotheses. Briefly, the coarse-grid KFs imply global-scale forecast errors while the variance-only and banded approximations assume local errors. Pure Monte Carlo methods acknowledge the importance of nonlinear terms, extended schemes neglect their effects locally in time, and linearized methods neglect them at all times. The forward integration of error fields using simpler physics models assumes that the dominant predictability error is never correlated to the complex physics. Steady-state error models are somewhat limited to fixed data arrays and statistically steady dynamics. For each attempt, the list of arguments for and against is long. Even if most a priori reduced methods have been successful data interpolators, they have logically led to controversies.

If one accepts that, in general, relatively little is known about dynamical and observational error fields, it is rational to limit the a priori assumptions. For the present comprehensive aims, the conditional mean, a minimum of several cost functionals or estimation criteria, is chosen for the optimal estimate. An approximation to the estimation criterion is obtained using heuristic characteristics of geophysical measurements and models. The resulting suboptimal approach is based on an objective, evolving truncation of the number and dimension of the parameters that characterize the conditional probability or error space. The ideal error (probability) subspace spans and tracks the scales and processes where the dominant errors (low probabilities) occur. The notion of dominant is naturally defined by the error measure used in the chosen optimal criterion: to each estimation criterion corresponds an error subspace definition. For the present minimum error variance approach, the logical definition yields an evolving subspace, of variable size, characterized by error singular vectors and values, or, similarly, the error empirical orthogonal functions (EOFs) and coefficients. Data assimilation via error subspace statistical estimation (ESSE) combines data and dynamics in accord with their respective dominant uncertainties. Once successfully applied, ESSE can rationally validate specific a priori error truncations for future tailored applications. Organizing the error space as a function of relative importance in fact defines a theoretical basis for quantitative intercomparison of today’s numerous reduced dimension methods. A first issue of course is the meaning and validity of the truncation of geophysical error spaces (section 5 and Part II). Another is that it is easy to define the concept of an evolving subspace, but it is harder to determine mathematical systems that describe and track its evolution. Most of the schemes for filtering and smoothing via ESSE (derived next) have variations. Focusing on the dominant errors fosters dynamical model testing and corrections. The error subspace also helps to identify areas and variables for which observations are most needed. ESSE provides a feasible quantitative approach to both dynamical model improvements and adequate sampling. Historically, these have been challenging issues. The accurate specification and tracking of the dominant errors hence appears of paramount importance from a fundamental point of view.

The text is organized as follows. Selected definitions and generalities are stated in section 2. Section 3 deals with the focus and specific objectives of the paper. Section 4 develops the main issues in realistic and comprehensive data assimilation today: the efficient reduction of error models and the powerful use of all information present in the few observations available. Section 5 addresses the meaning of the variations of variability and error subspaces. The correlations in geophysical systems are emphasized and the ESSE criteria introduced. Sections 6 and 7 derive schemes for filtering and smoothing via ESSE, respectively. These filtering and smoothing algorithms with nonlinear systems are succinctly compared with existing “suboptimal” procedures. Section 8 consists of the summary and conclusions. Appendix A describes most of the notation and assumptions. Appendix B addresses important specifics and variations of the ESSE schemes presented.

## 2. Definitions and generalities

*dynamical model*is an approximation of the basic laws to the phenomena and scales of interest. It defines the time–space evolution of the state variables and, in continuous form (appendix A),

**d**

*ψ***f**

*ψ**t*

*dt.*

*Dynamical variability*(true or model) refers to the statistics of the difference between the dynamical system evolution (true or model) and a reference mean state. The model (1a) usually considers both the variability and mean state. The

*measurement model*

^{1}is a

*directed*relation linking the state variables to the observations (appendix A):

**d**

_{k}

**C**

_{k}

*ψ*_{k}

**d**

_{k}can be sensor, feature modeled, or structured data (Lozano et al. 1996). For simplicity, (1b) is assumed locally linear; integrating the nonlinear dynamics (1a) is often more critical. In ocean and atmosphere estimations, the determination of efficient measurement models is important. Depending on the sensor and state variables, they can be simple and straightforward, as a link between the heat equation and temperature data, or complex and indefinite, as a link between coupled physical–biological equations and remotely sensed data (e.g., ocean color, surface height, or temperature). By nature, (1b) is a well-posed mapping, from a large state space to a usually much smaller data space; but its inverse is not mathematically defined by (1b). For the inverse to be well posed, an additional data–dynamics melding criterion is required. The dynamical constraints enhance the observations and vice versa; this dual feedback is essential.

*estimation or melding criterion*determines the respective influence of the dynamics and observations on the state estimate. A

*DA system*consists of three components: a dynamical model, a measurement model with sensor data, and an estimation criterion. The data assimilation problem is to determine the best possible field estimate that is in accord with the dynamical and measurement models, within their respective uncertainties. By “best” it is meant “in closest statistical accord with the truth.” The notion of close accord is defined mathematically by the estimation criterion. Within the criterion, all constraints are weak a priori, but to ease computations some may be assumed strong, depending on the relative estimated accuracy of each constraint. For instance, the nonlinear dynamical model can be considered either as a strong or weak constraint (e.g., Reid 1968; Sasaki 1970; Daley 1991; Bennett 1992; Wunsch 1996). Finally, it must be remembered that the central notion of an optimal estimate is a function of the melding criterion and associated statistical hypotheses. The ultimate arbiter consists of using the optimal estimate for ocean prediction.

## 3. Focus and specific objectives

A comprehensive assimilation system is suited to most nonlinear oceanic–atmospheric phenomena and most data types and coverage. Even though weather and ocean modeling differ (dynamics, data and models available, constraints), the approach chosen is general. The main restriction is a focus on the synoptic/mesoscale circulation and processes. The assimilation period is left arbitrary; it could be days to years. It is only assumed that the observations are time–space adequate with regard to the predictability limits of the phenomena of interest. This notion of adequate is subtle since even for known predictability limits and validated models (1a)–(1b), data sufficiency is still a function of the assimilation scheme. The better the scheme, the fewer the necessary data (e.g., Lorenc et al. 1991; Todling and Ghil 1994). The application considered in Part II mainly involves simulated temperature and salinity data for the control of shelfbreak front phenomena.

As far as specific objectives, the scheme should address model nonlinearities, estimate the uncertainty of the forecast with sufficient information on a posteriori errors, be suitable for assimilation in real time as well as for parallel computations, and allow adaptive filtering and parameter estimation. Most of the existing optimal schemes (section 4) cannot yet satisfy our practical objectives. The information, accuracy, and realistic goals may reduce the success of today’s operational methods like the optimal interpolation (OI), commonly used in weather forecasting (Bengtsson et al. 1981; Ghil 1989;Lorenc et al. 1991) and in real-time, at-sea ocean prediction (Robinson et al. 1996a,b). One would like to improve the existing nowcast/forecast capabilities and hopefully increase general understanding; data assimilation should feed back to fundamental science.

## 4. Constraints and issues

The assimilation problem of section 2 is a nonlinear statistical estimation problem. Existing nonlinear schemes have first been analyzed with regard to our goals. In this text, it is simply our intention to focus on the essentials of nonlinear schemes and issues, and provide references for more complete discussions. Within estimation theory, *Bayesian estimation* and *maximum likelihood estimation* are the common approaches (e.g., Jazwinski 1970; Gelb 1974; Lorenc 1986; Boguslavskij 1988). In control theory, the DA problem is seen as a deterministic optimization issue, and for quadratic cost functions it amounts to *weighted least squares estimation* (Le Dimet and Talagrand 1986; Tarantola 1987;Sundqvist 1993). Statistical assumptions can be implicitly associated with all approaches (Robinson et al. 1998a). The discussion to follow involves Bayesian ideas.

The Bayesian conditional mean is the optimal estimator with respect to several cost functionals for arbitrary statistics. In this study, it is chosen as the optimal estimate. For nonlinear systems, it depends on all moments of the conditional density function, hence, on an infinite number of parameters. Solving for the conditional probability density governed by the Fokker–Planck or Kushner’s equation (Jazwinski 1970) is today an informative guide with simple systems (Miller et al. 1994; Miller et al. 1998). In real ocean–atmosphere nonlinear estimation, the aim is to approximate the quantities of primary interest, the state, and its uncertainty. Taylor expansions and local linear extensions yield the common approximate schemes: the linearized, extended, higher-order and iterated Kalman filters/smoothers (KF/KS) and associated simplifications (e.g., Boguslavskij 1988). They provide an estimate of the uncertainty, but their truncated expansion may diverge (Evensen 1993, 1994a,b) and require frequent reinitialization. Most of the control and weighted least squares methods were derived for linear systems but can be iterated locally to solve nonlinear generalized inverse problems (e.g., Bennett et al. 1996; Bennett et al. 1997). The representer method considers model errors and minimizes the cost function in the data space but can be as costly as the estimation schemes if a posteriori state error covariance estimates are required. Direct global minimizations (e.g., Robinson et al. 1998a) are alternatives but a physically realizable solution is not assured. To limit expensive model integrations, a good first guess is required. In fact, the convergence of the iterated weighted least squares schemes is not yet proven (Evensen and van Leeuwen 1996). Smoothing prior to the minimization appears necessary (section 7).

The main advantages of all the above methods are the update and dynamical forecast of the error covariance. Even with linear models, forecast errors can differ considerably from the ones currently prescribed in OI (Parrish and Cohn 1985; Cohn and Parrish 1991; Miller and Cane 1989; Todling and Ghil 1990; Daley 1992a–c). Nonetheless, this error forecast and update is very expensive. For nonlinear systems, the required dimension is infinite. For discrete linear systems with *n* degrees of freedom, one needs *n*^{2})*n* of ^{5}–10^{6})

There are two constraints that a DA system needs to address. First, the dimension of the full error model associated with (2a)–(2b) is too large. Since less is known about errors than about dynamics, a careful reduction is necessary. The a priori hypotheses should be limited. With experience, for specific regions and processes, and particular data properties, some of today’s hypotheses (e.g., Todling and Cohn 1994; appendix B) may be validated for use as a priori information. Even though the data coverage, type, and quality have increased in the past decades, the second constraint is the limited data sets. This concern is of special relevance in oceanography (Robinson et al. 1998a). To optimize the assimilation, it is very important to utilize at once all information contained in the few observations available. In summary, the issues are (a) how to reduce the size of the error model while explaining as accurately as possible the error structure and amplitude (many degrees of freedom, limited resources) and (b) how to optimize the extraction of reliable information from observations limited both in type and coverage.

## 5. Error subspace statistical estimation

Estimation criteria that address the objectives and questions raised in sections 3–4 are now identified. The approach is dynamic, reflecting basic properties of oceanic–atmospheric systems. Based on essential characteristics of geophysical measurements and models, the first property argues that efficient bases to describe, respectively, the dynamics, the variations of variability, and the errors, exist and are meaningful (section 5a). Section 5a is detailed since confusions between these evolving bases lead to controversies. The second property relates to the correlations between geophysical fields (section 5b). These facts are then used to determine the optimal representation of the error and a criterion for data–dynamics melding, leading to the ESSE concept (section 5c).

### a. Efficient basis for describing variations of variability and error subspace

Even though it still needs to be rigorously proven that many observed geophysical phenomena can be associated with phase space attractors of low, finite dimension (West and Mackey 1991; Osborne and Pastorello 1993), statistical data analysis at least infers that most geophysical phenomena are colored random processes, for example, red in time and space. Nonetheless, some field observations (e.g., satellite imagery) exemplify geophysical features (Monin 1974; Robinson 1989) at most energetic scales.^{2} These structures occur intermittently, with strong similarities between occurrences. Several form anisotropic, nonhomogeneous, multiscale but coherent dynamical fields. Hence, most observed geophysical phenomena develop structures at many scales and have colored spectra. At the dynamical modeling end, many driven-dissipative systems as well as nonlinear conservative systems have been shown to possess “attractors” of finite dimension (e.g., Bergé et al. 1988; Osborne and Pastorello 1993). The existence of a global attractor for the Navier–Stokes equations has been proven in two dimensions (Foias and Teman 1977) and in three dimensions for remaining smooth fields (Foias and Teman 1987), with the dimension of the attractor being a function of the Reynolds number. As can also be deduced from Kolmogorov’s physical principles (Kolmogorov 1941), the dynamical systems approach thus implies that the number of degrees of freedom necessary to describe most synoptic/mesoscale-to-large-scale turbulent geophysical flows is limited (Teman 1991).

Equation (1a) is of high dimension more for numerical accuracy than for physical variability dimensionality. The above-mentioned observations’s/models’s dual properties imply that the dominant geophysical variability consists of dynamical patterns and structures, with, in general, a colored spectrum that can be efficiently described at each instant by a limited number of functions or modes. The time–space physical nature of these functions evolves with the system’s position in the phase space and local structure of the attractor, if it exists (e.g., Anderson 1996). In practice, the ideal choice of functions is a concern; it defines the evolving dynamics subspace. One aim is to reduce the number of functions to a minimum while still describing most of the variability of interest. Common techniques are dynamical normal modes or singular vectors, empirical modes or EOFs (Lorenz 1965; Davis 1977b; Weare and Nasstrom 1982; Wallace et al. 1992; von Storch and Frankignoul 1998), principal oscillation and interaction patterns (POPs and PIPs) (Hasselmann 1988; von Storch et al. 1988; Penland 1989; Schnur et al. 1993), and radial functions and wavelets (e.g., Gamage and Blumen 1993).

The leap from the above reduction of variability to an evolving efficient reduction of the error model (section 4) is now made in three successive steps. The model errors and data available are first assumed null; that is, **dw** = 0 and **d**_{k} = **v**_{k} = 0 for ** k** > 0 (appendix A). The dynamics of the true and model systems then only differ because of the predictability error. The uncertainties are a subset of the local variations of variability, which have structural and spectral properties analogous to those of the dynamics subspace. If model errors and data are nonexistent, the dominant uncertainties can be described by a finite set of state-space vectors and the additional error variance explained by each new vector decays rapidly (e.g., hyperbolic, exponential, or power decay).

*ψ̂*^{t}

**d**

*ψ*^{t}−

**d**

*ψ̂*^{t}= [

**f**(

*ψ*^{t},

*t*) −

**f̂**(

*ψ*^{t},

*t*)]

*dt*+

**dw**

**f̂**is the expected value of

**f**(appendix A). For meaningful modeling, the ratio of model to predictability errors should not be much larger than one. If at any given time model errors have amplitudes similar to predictability errors, they represent energetic physical processes not captured by the deterministic model (1a). When model errors are important, they are thus local variations of variability and the previous structured, power decay property applies. Model error covariances

**Q**

*t*)

^{3}Combining the three conclusions, a time-evolving, limited dimension subspace contains most of the error. It is influenced by dynamics, data and model errors.

### b. Correlations in geophysical systems

The limited datasets issue is addressed by the multiscale, multivariate correlations between geophysical field variations. For instance, dynamical and statistical studies show that a tracer transect is related to the reference velocity in an ocean basin (Wunsch 1988); the type and strength of local precipitation informs us about remote weather; an El Niño event can lead to abnormal conditions at other times/places; the surface temperature in the Gulf Stream at a given time implies some of the deeper water properties; the upwelling/downwelling variations along a coast relate to the coastal distribution of nutrients; each fish species has optimum water properties; the sea color correlates with phytoplankton concentration; etc. These examples appear simple but in reality correlation issues are subtle due to the multiscale, multivariate, inhomogeneous, anisotropic, or nonstationary properties (e.g., McWilliams et al. 1986; Lorenc 1986, 1992; Daley 1991). In summary, 3D multivariate DA, in accord with the phenomena and time–space scales considered, is necessary. The expression *3D multivariate DA* indicates here that each datum instantaneously influences all state variables and scales that matter. This impact must be in accord with the evolving dynamics. For instance, surface altimeter fields are not always at once linked to subsurface properties (e.g., Martel and Wunsch 1993; Fukumori et al. 1993).

### c. ESSE

The conditional mean was chosen for optimal estimate. For all statistics, minimizing the expectation of a convex measure of the estimate’s uncertainty leads to that optimum (appendix B, section d). Approximate definitions of uncertainty and measures thereof lead to approximate estimates. Here, both notions are determined using sections 5a and 5b.

^{4}The first conclusion (section 5a) supports its truncation to a most energetic error subspace while the second (section 5b) implies that the melding criterion should be 3D multivariate. A reduced-rank approximation of the error covariance

**P**

*t*

_{k}is thus optimal for a given rank

*p*

_{k}if it explains the maximum variance and structure of the multivariate

**P**

_{k}that is possible. For the structured, power-decay property (section 5a) there is a relatively small number

*p*

_{k}for which the optimal reduction explains most of

**P**

_{k}. Denoting for convenience this time-variant

*p*

_{k}by

*p,*the associated reduction is called the principal error covariance

**P**

^{p}

_{k}

**P**

_{k}

**P**

^{p}

_{k}

**P**

^{c}

_{k}

**P**

^{c}

_{k}

**P**

_{k}

**P**

^{p}

_{k}

**P**

^{p}

_{k}

**E**

_{k}

**Π**

_{k}

**E**

^{T}

_{k}

**Π**

_{k}∈ R

^{p×p}and structure

**E**

_{k}∈ R

^{n×p}, hence satisfies

**P**

^{c}

_{k}

_{2}and Frobenius norm ‖

**P**

^{c}

_{k}

_{F}), the optimum in (5) is the dominant rank-

*p*singular value decomposition of

**P**

_{k}(Horn and Johnson 1985, 1991). The matrix

**Π**

_{k}is the ordered diagonal of dominant-

*p*singular values and the columns of

**E**

_{k}are the associated singular vectors. Since

**P**

_{k}is positive semidefinite,

**E**

_{k}

**Π**

_{k}

**E**

^{T}

_{k}

*p*eigendecomposition. The columns of

**E**

_{k}form a basis for the 3D multivariate error subspace;

**Π**

_{k}is the error subspace covariance. At a given time,

*t*

_{k}, and for a given

*p,*these matrices characterize the error subspace (ES). They answer the first issue raised in section 4.

*t*

_{k}, only the error measure in this ES still needs to be chosen. Several exist (e.g., Horn and Johnson 1985), but the most logical is the Euclidean one. Combining the criterion (5) with the Euclidean minimum error variance approach leads to the present notion of ESSE. Data and dynamics are melded such that the a posteriori ES variance is minimized. The estimation criteria are

*t*;

_{k}*t*; and

_{k}*t*within the data interval [

_{k}*t*

_{0},

*t*]; where each of (6a)–(6c) are subject to the dynamical and measurement model constraints (2a)–(2b). To distinguish between a priori and a posteriori quantities, the symbols (−) and (+) are employed (see appendix A for more on notation). Except for the ignored small errors, criteria (6a)–(6c) follow the Bayesian minimum error variance nonlinear estimation. They lead to efficient, 3D multivariate analyses since the meldings occur within the ES. The second question raised in section 4 is answered. The criteria (6a)–(6c) also relate to the efficient concept of “minimax assimilation”: the maximum errors, here in the Euclidean sense, are minimized. The general goal of ESSE is to determine the considered ocean–atmosphere state evolution by minimizing the dominant errors, in accord with the full dynamical and measurement models, and their respective uncertainties. Of course, the criteria (6a)–(6c) are only theoretical. In practice, efficient schemes for finding (6a) and tracking (6b)–(6c) the ES have to be determined (sections 6, 7).

_{N}Objective analysis via ESSE (6a) or “fixed-time ESSE” emphasizes the general applicability of the approach. In fact, Anderson (1996) has shown that for the Lorenz model (Lorenz 1963) the projection of classical OA error correlation onto the local attractor sheet is the most effective selection of initial conditions for ensemble forecasts. This conclusion has also been exemplified in primitive-equation (PE) modeling (Lermusiaux 1997). Since the ES is a subset of the local variations of variability, (6a) is the most efficient analysis for given resources and in the Euclidean framework. In Part II of this study, the focus is on filtering via ESSE (6b). The problem statement is given in Table 1.

Section 6 outlines filtering via ESSE schemes using a Monte Carlo approach; dynamical systems for adaptive ESSE are also derived in Lermusiaux (1997). The smoothing via ESSE problem statement is as in Table 1, but with criterion (6c) replacing (6b). For linear models (2a), this corresponds to the generalized inverse problem restricted to the dominant errors. Such smoothing schemes with a posteriori error estimates are obtained in section 7. The five main components of the present ESSE system (initialization, field and ES nonlinear forecasts, minimum variance within the ES, and smoothing) are illustrated in Fig. 1.

Some general properties are now discussed. First, in (6a)–(6c) the data only correct the most erroneous parts of the forecast. The accurate portion of the forecast is corrected by dynamical adjustments and interpolations. Second, the ES in (5) is time dependent. The reduction to the principal error covariance is dynamic. The ES tracks the scales and processes where the dominant errors occur. The time rate of change of the ES is a function of the (i) initial uncertainty conditions; (ii) evolving model errors; and (iii) data type, quality, and coverage;and of the nonlinear, interactive evolution of these three components. All these factors influence the nature of the ES (e.g., multiscale, anisotropic, homogeneous, or not). In fact, the successes of the optimal perturbations (OP) to determine initial conditions for ensemble forecasting (e.g., Mureau et al. 1993; Toth and Kalnay 1993;Molteni and Palmer 1993; Molteni et al. 1996) can be improved by quantitatively taking into account data quality and model errors. The OP spectrum and structures, which only consider the predictability error, should be modified accordingly, especially in oceanography. ESSE provides a theoretical framework to do so. Third, the error and dynamics subspaces in general differ. Fourth, the statistical estimation of the ES (sections 6, 7) yields the notion of error EOFs. In fact, there are several quantitative definitions for the ES, each associated with slightly modified criteria (6a)–(6c): for example, singular or normal error modes; extended, complex, or frequency error EOFs; error POPs and PIPs;and synoptic and wavelet-based ES. If the maximum norm had been chosen in (5), a maximum measure would have been logical in (6a)–(6c). These particular formulations are not discussed further. Priority is given to the central concept, common to all representations. Fifth, the ideal ES dimension *p* (e.g., appendix B, section b) evolves in time, in accord with the dynamics, model errors, and available data. It is only for statistically stationary data, dynamics, and model errors that *p* should stay constant. In all schemes of sections 6 and 7, the size of the ES is thus time dependent.

## 6. Filtering via ESSE schemes

A recursive scheme is now derived for filtering in the ES (5) corresponding to the models (2a)–(2b). One needs to track the ES evolution, which is not trivial. The two-step root of the algorithm consists of a forecast–data melding when data are available (section 6a), followed by the dynamical state and ES forecasts to the next assimilation time (section 6b). It is assumed that an estimate of the conditional mean state and associated ES have been integrated from *t*_{0} to *t*_{k} using (1)–(2) and (6b) for [**d**_{0}, . . . , **d**_{k−1}]. Hence, *ψ̂*_{k}(−)**E**_{k}(−)**Π**_{k}(−) are available. Specifics and variations of the algorithm are discussed in appendix B.

### a. ES melding or ES analysis scheme

As in most existing schemes, the data–forecast melding is chosen linear a priori. This causes a departure from the strict Bayesian approach (e.g., Jazwinski 1970), which would solve (6a)–(6c) without making this simplification. Nonetheless, the melding weights are determined using criterion (6b) in the nonlinearly evolved ES (5). To simplify notations, the index *k* is omitted. The minimum error covariance melding is first decomposed exactly in terms of error eigenvalues and eigenvectors [section 6a(1)]. Its ES truncation is given in Lermusiaux (1997). In section 6a(2), the sample or empirical ES melding is derived.

#### 1) Eigendecomposition of the minimum error variance linear update

**P**

*ψ*^{t}yields the melded estimate

**(+) =**

*ψ̂***(−) +**

*ψ̂***K**

**d**−

**C**

**(−)]**

*ψ̂***P**

**P**

**KCP**

**K**

**K**

**P**

**C**

^{T}

**CP**

**C**

^{T}

**R**

^{−1}

**U**

_{−}

**U**

^{T}

_{−}

**U**

^{T}

_{−}

**U**

_{−}

**U**

_{+}

**U**

^{T}

_{+}

**U**

^{T}

_{+}

**U**

_{+}

**I**

**C̃**

**CU**

_{−}. The eigendecomposition of the nonnegative definite

**Λ̃**(+)

**Λ**(+) is diagonal and the columns of

**H**

^{n×n}are a set of orthonormal eigenvectors for

**Λ̃**(+)

**P**

**U**

_{+}

**Λ**(+)

**U**

^{T}

_{+}

**U**

_{−}

**H**

**Λ**(+)

**H**

^{T}

**U**

^{T}

_{−}

**Λ**(+) is given by (14) and the a posteriori error eigenvectors by

**U**

_{+}

**U**

_{−}

**H**

#### 2) Minimum sample ES variance linear update

In this section, a *sample* ES forecast described by **E**_{−} and **Π**(−) is assumed available (section 5b). The data-forecast melding within the sample ES is outlined, with updates of the fields and ES covariance. For the details, we refer to (Lermusiaux 1997).

*(i)* Dynamical state update

**C̃**

^{p}

**CE**

_{−}. For adequate a priori sampling, that is,

**P**

^{p}(−)

**E**

_{−}

**Π**(−)

**E**

^{T}

_{−}

**P**

*(ii)* Sample ES update

The derivation of the ES covariance update requires care since the present ES forecast (section 5b) is obtained from an ensemble forecast. As discussed in (Evensen 1997b; Lermusiaux 1997; Burgers et al. 1998), the original ensemble update algorithm (e.g., Evensen1994a) underestimates a posteriori errors. The two ES algorithms outlined next give a correct error estimate at the infinite ensemble limit. The scheme A directly estimates **Π**(−) and **E**_{+}. The scheme B updates the SVD of the ensemble spread.

*Scheme A: Update of the sample ES covariance.*An ensemble of

*q unbiased*dynamical states is denoted by

*ψ̂*^{j}(−)

*j*= 1, . . . ,

*q.*The associated a priori and a posteriori error sample matrices,

**M**

**M**

^{n×q}, are

^{5}

*j*consists of the ensemble member

*j*minus the mean estimates

**(−)**

*ψ̂***(+)**

*ψ̂***P**

^{s}

**MM**

^{T}/

*q,*is now obtained. Denoting by

**d**

^{j}a set of

*q*data vectors perturbed with noise of zero mean and covariance

**R**

**K**

^{s}

**M**

**I**

**K**

^{s}

**C**

**M**

**K**

^{s}

**V**

**V**

**v**

^{j}] = [

**d**

^{j}−

**d**] ∈ R

^{m×q}are realizations of the random processes

**v**. The update of the sample error covariance derives from (21)

**R**

^{s}

*q*)

**VV**

^{T}

^{q}{

**v**

^{j}

**v**

^{jT}}

**Ω**

^{s}≐

*q*)

**M**

**V**

^{T}

^{q}

*ψ̂*^{j}(−)

**(−)]**

*ψ̂***v**

^{jT}

**K**

^{s}

**P**

^{s}(+),

**K**

^{s}

**P**

^{s}(−)

**C**

^{T}(

**CP**

^{s}(−)

**C**

^{T}+

**R**

^{s})

^{−1},

**Ω**

^{s}→ 0 for

*q*→ ∞. Neglecting the associated symmetric sum in (23) yields an estimate of

**P**

*q*

**P**

^{s}(+) = (

**I**

**K**

^{s}

**C**

**P**

^{s}(−).

*p*reduction of (19)–(24). It is efficiently estimated based on the SVD of

**M**

**M**

_{p}( · ) selects the dominant rank-

*p*SVD. After melding, the

*p*dominant left singular vectors, columns of

**E**

_{+}, form the ordered basis for the ES of dimension

*p*⩽

*q.*The corresponding singular values yield the diagonals

**Π**

_{k}(−) and

**Π**

_{k}(+),

**H**

**Π̃**(+)

**P**

^{p}(+)

**E**

_{−}. The corresponding eigenvalues form the diagonal

**Π**(+) and

**I**

^{p}

^{p×p}is the identity matrix. With (18) and (24)–(25), the columns of

**E**

_{−}and

**E**

_{+}in (28) span the same space. Table 3 summarizes the sample ES scheme. Algorithmic and computing issues are discussed in appendix B, section e.

In this scheme A, the ensemble update (20a) is not carried out. It was only utilized to derive the ES covariance update. Only one melding (17) is necessary. The number *p* of significant error EOFs or singular vectors is smaller than the ensemble size *q,* which reduces computations in Table 3 (appendix B, section c). Of course, for efficient ensemble forecasts, *q* should not be much larger than *p* (i.e., at most, an order of magnitude larger).

*Scheme B: Update of the SVD of the ensemble spread.*A disadvantage of (27)–(28) is that the information contained in the right singular vectors of (25a) is lost. Once

**E**

_{+},

**Π**(+), and

**(+)**

*ψ̂**p*SVD update is now sought. Using (25a)–(25b) to reduce (19a)–(19b), rewritten for clarity in a vector form [with (

**A**

^{j}denoting the column

*j*of

**A**

**K**

^{s}

**V**

**K**

^{s}

**R**

^{s}

**K**

^{sT}

**E**

_{+}

**Σ**(+)

**V**

^{T}

_{+}= SVD

_{p}[(

**I**

**K**

^{s}

**C**

**M**

**K**

^{s}

**V**

*p*SVDs. These are denoted by (25a) for the ensemble of states and by SVD

_{p}(

**V**

**V**

^{p}∈ R

^{p×p}for the perturbed data. With these rank-

*p*approximations, using (25b) and the optimal gain (18), (21) reduces to, at

*p*

**V**

_{−}

**V**

^{T}

_{−}

**I**

^{p}

*p,*leads to

*p*SVD (29b), as summarized in Table 4. The advantage of (30) or (33a)–(33c) over (27)–(28) is the right singular vector update. The a posteriori states [(29b)] are physically balanced in the sense of the estimation criterion (section 2). If one uses Table 3, techniques to create an ensemble of a posteriori states from (27)–(28) are needed [section 6b(2)]. On the other hand, scheme A is an efficient procedure to compute the a posteriori ES covariance (27)–(28). It is in fact straightforward to show that the a posteriori rank-

*p*sample error covariance obtained from (29b) and (33a)–(33c) is, at

*p*

*p*of the ES to be time variant and algorithms for evolving

*p*are discussed next in section 6b.

### b. Dynamical state and error subspace forecast

The quantities *ψ̂*_{k}(+)**E**_{k}(+),**Π**_{k}(+) obtained in (section 6a) are now known. The goal is to issue their forecast to the next data time *t*_{k+1}. For large nonlinear models, we expect that for adequate sampling of the initial error conditions, an ensemble forecast is efficient to estimate the evolution of the state and its uncertainty. Monte Carlo forecasting is, thus, the approach followed. Several alternatives are discussed in appendix B, section c.

#### 1) Dynamical state forecast

*ψ̂*^{t}

**d**

*ψ̂*^{t}

**f̂**

*ψ*^{t}

*t*

*dt.*

*central forecast*to

*t*

_{k+1},

*ψ̂*^{cf}

_{k+1}(−)

**d**

**=**

*ψ̂***f**(

**,**

*ψ̂**t*)

*dt,*

*ψ̂*_{k}=

*ψ̂*_{k}(+)

*ψ̂*^{t}

_{k+1}

*t*

_{k+1}as in (2a),

**d**

*ψ̂*^{j}=

**f**(

*ψ̂*^{j},

*t*)

*dt*+

**dw**,

*ψ̂*^{j}

_{k}

*ψ̂*^{j}

_{k}

*ensemble mean*at

*t*

_{k+1}estimates

*ψ̂*^{t}

_{k+1}

*q*

*ψ̂*^{em}

_{k+1}(−)

^{q}

*ψ̂*^{j}

_{k+1}(−)

*ψ̂*_{k+1}(−)

#### 2) Error subspace forecast

*ψ̂*^{t}

**P**

*t*

_{k+1}according to

**P**

^{p}

_{k+1}(−)

*t*

_{k+1}of states initially sampling the a posteriori ES structure

**E**

_{k}(+)

**Π**

_{k}(+). The three local steps involved are described next.

*(i)* Create an ensemble whose covariance from *ψ̂*_{k}(+) tends to **P**^{p}_{k}(+)

*ψ̂*_{k}(+)

**P**

^{p}

_{k}(+)

*π*^{j}

_{k}(+)

^{p}have to be determined. The simplest choice is

*π*^{j}

_{k}(+)

**Π**

^{1/2}

_{k}(+)

**u**

^{j}

**u**

^{j}∈ R

^{p}are

*q*realizations of a random vector

**u**of zero mean and covariance

**I**

^{p}. For

*q*=

*q*

_{k}(+) → ∞ in (38)–(39a), the sample covariance with respect to

*ψ̂*_{k}(+)

**P**

^{p}

_{k}(+)

*ψ̂*^{j}

_{k}(+)

*ψ̂*^{j}

_{k}(+)

**u**

^{j}are free in (39a). Because of the orthogonality condition, some combinations of singular vectors can also lead to unrealistic variability, even if the true error subspace is spanned. Finally, some of the randomly generated

**u**

^{j}(e.g., Gaussian) can have values quite far from their statistical mean and variance. Hence, constraining the combinations (39a) is useful to reject the few states of possibly too unrealistic or unlikely physics. A simple constraint is to force the

*ψ̂*^{j}

_{k}(+)

*π*^{j}

_{k}(+)

*π*^{j}

_{k}(+)

**Σ**(+)(

**V**

^{T}

_{+})

^{j}

An important advantage of (38)–(39) is that the size of the a posteriori ensemble *q*_{k}(+) is easily made larger than that of the a priori one, *q*_{k}(−). Since *q*_{k}(+) = *q*_{k+1}(−), this is carried out as a function of the convergence of the ES forecast to *t*_{k+1} (appendix B, section b). Note that increasing the size of the ensemble using (38)–(39) does not extend the base spanning the ES at *t*_{k} [in our two-step recursive assumption, **E**_{k}(+)*t*_{k+1} increases the size of the ES forecast to *t*_{k+1}, the significance of which grows with the duration of integration. The nonlinearities lead to an evolving ES (section 5c). This fact is illustrated in Part II. Stochastic model errors in (36a) also excite growing modes of variability and thus favor the ES evolution. With linear models, the size of the ES is only modified by this stochastic forcing. If model errors are null, to evolve the ES one must then add new columns to **E**_{k}.

*(ii)* Integrate each ensemble member from *t*_{k+1} using the sample path (2a)

*t*

_{k+1},

**dw**(

*t*) is a vector Brownian motion process, representing a priori model errors. Its covariance over

*dt*is

**dw**(

*t*)

**dw**

^{T}(

*t*)} ≐

**Q**

*t*)

*dt*

**B**

*t*)

**B**

^{T}(

*t*)

*dt,*

**B**

*t*)

^{n×r}. The ES concept argues that restricting

**B**

*t*)

*r*≪

*n*columns corresponding to the rank-

*r*eigendecomposition of

**Q**

*t*)

**B**

*t*)

*(iii)* Compute the forecast of the ES structure and amplitude at *t*_{k+1}

*ψ̂*_{k+1}(−)

**M**

_{k+1}(−) = [

*ψ̂*^{j}

_{k+1}(−)

*ψ̂*_{k+1}(−)

^{n×q}, is evaluated. The sample estimates

**Π**

_{k+1}(−) and

**E**

_{k+1}(−) of rank

*p*⩽

*q*are then most efficiently obtained from the rank-

*p*SVD of

**M**

_{k+1}(−),

**Π**

_{k+1}(−) ≐

*q*)

**Σ**

^{2}

_{k+1}(−)

The basis of the present ESSE filtering scheme consists of Tables 3 and 5. A flow schematic of the algorithm is shown in Fig. 2. As in Fig. 1, each operation consists of several subcomputations and options, with corresponding equations (e.g., appendix B). For instance, the ES initialization and the adoptive error subspace learning are challenging (Lermusiaux 1997; Lermusiaux et al. 1998, manuscript submitted to *Quart. J. Roy. Meteor. Soc.*).

## 7. Smoothing via ESSE schemes

The improvement of the filtering solution *ψ̂*_{k}(+)*ψ̂*_{k/N}*t*_{0}, *t*_{N}] is fixed: fixed-interval smoothing is considered. The complete problem statement is as in Table 1, but with (6b) replaced by (6c). The issues in nonlinear smoothing are that a forward integration of (2a) is rarely invertible and that running nonlinear geophysical models backward in time can be difficult. In fact, most nonlinear smoothers are based on some of sort of localized (and iterated) approximation (Jazwinski 1970). The present approach falls into that category. However, the philosophy somewhat differs from that of some schemes previously utilized in geophysical studies: it is argued that i) for the lack of data, an accurate filtering solution (section 6) is an essential prerequisite to the smoothing; ii) the linearization, if any, should be local enough; and iii) in real-time smoothing, a few iterations of approximate schemes can be very valuable (section 7c). With this in mind, smoothing via statistical approximation is developed in section 7a. Its ESSE version is outlined in section 7b. A discussion, with additional approximate ESSE algorithms, is presented in section 7c. Specifics are provided in appendix B.

### a. Smoothing via statistical approximation

The approach consists of nonlinear filtering until *t*_{N} (sections 4–6), followed by the update of the conditional mean and error covariance, backward in time, based on future data. This later component is now outlined. It is recursive as was the filtering. The derivation presumes that the smoothing estimates *ψ̂*_{k+1/N}**P**_{k+1/N} have been obtained. The unknowns are *ψ̂*_{k/N}**P**_{k/N}. A statistical approximation (e.g., Gelb 1974; Austin and Leondes 1981) to the forward integration of (2a) between data times *t*_{k} and *t*_{k+1} is first derived, assuming that *ψ*^{t}_{k+1}*t*_{k+1}, it is used to compute *ψ̂*_{k/N}**P**_{k/N}. The resulting smoothing is shown to include a few classic schemes as particular cases. Its truncation to a significant subspace is outlined in section 7b.

*t*

_{k+1}is small enough and that the statistical linearization is made around the data-corrected filtering solution, characterized by the initial conditions

*ψ̂*_{k}(+)

**P**

_{k}(+), and forecasts,

*ψ̂*_{k+1}(−)

**P**

_{k+1}(−). We seek a linear relation that estimates how to correct

*ψ̂*_{k}(+)

*ψ*^{t}

_{k+1}

*ψ̂*_{k}

*ψ̂*_{k}(+)

**L**

_{k}

*ψ*^{t}

_{k+1}

*ψ̂*_{k+1}(−)

**L**

_{k}should be such that (42) minimizes the error variance of

*ψ̂*_{k}

*ψ*^{t}

_{k}

**P**

_{k}in (43) leads to

**P**

_{k+1}(−) is the error covariance forecast carried out from

**P**

_{k}(+) based on (1a)–(2a). Inserting (44) into (43) and taking the derivative with respect to

**L**

_{k}yields

**P**

_{k+1}(−) is positive semidefinite. The optimum (45) and relation (42) define the statistical backward linearization sought. It is now employed to compute the smoothing conditions at

*t*

_{k}. The best available unbiased estimate of

*ψ*^{t}

_{k+1}

*ψ̂*_{k+1/N}

**P**

_{k+1/N}. Using it in (42) gives the smoothing estimate

*ψ̂*_{k/N}

*ψ̂*_{k}(+)

**L**

_{k}

*ψ̂*_{k+1/N}

*ψ̂*_{k+1}(−)

*ψ̂*_{N/N}

*ψ̂*_{N}(+)

*ψ*^{t}

_{k}

*ψ̂*_{k}(+)

*ψ*^{t}

_{k})

*ψ̂*^{T}

_{k+1}(−)

*ψ̂*_{k/N}

*ψ*^{t}

_{k}

*ψ̂*^{T}

_{k+1/N}

*ψ̂*^{T}

_{k+1}(−)

*ψ̂*_{k+1/N}

*ψ̂*^{T}

_{k+1}(−)

*ψ̂*_{k/N}

*ψ*^{t}

_{k}

*ψ̂*_{k/N}

*ψ̂*_{k}(+)

*ψ*^{t}

_{k+1}

*ψ*^{tT}

_{k+1}

**P**

_{k+1}+

*ψ̂*_{k+1}

*ψ̂*_{k+1}

*ψ̂*_{k+1}

*ψ̂*_{k+1/N}

*ψ̂*_{k+1}(−)

*ψ̂*_{k/N}

**P**

_{k/N}

**P**

_{k}

**L**

_{k}

**P**

_{k+1/N}

**P**

_{k+1}

**L**

^{T}

_{k}

**P**

_{N/N}=

**P**

_{N}(+). The complete scheme is stated in Table 6. In passing, if

*ψ̂*_{k+1/N}

**P**

_{k+1/N}. In simple words,

**L**

_{k}

**P**

_{k+1/N}

**L**

^{T}

_{k}

*ψ*^{t}

**)**

*ψ̂*

*ψ*^{t}

_{k}

*ψ̂*_{k}(+)

*t*

_{k+1}and denoting by

**Φ**(

*t*

_{k+1}

*t*

_{k}) the corresponding state transition matrix, one has

**w**

_{k+1}accounts for the integrated effects of

**dw**over Δ

*t*

_{k+1}in (50). Inserting this simplification (51) into (45) yields

**L**

_{k}

**P**

_{k}

**Φ**

^{T}

*t*

_{k+1}

*t*

_{k}

**P**

^{−1}

_{k+1}

**Φ**

### b. ESSE and smoothing via statistical approximation

*ψ̂*_{k+1/N}

**E**

_{k+1/N}and

**Π**

_{k+1/N}defining

**P**

^{p}

_{k+1/N},

*ψ̂*_{k/N}

**E**

_{k/N}, and

**Π**

_{k/N}. Starting from the nonlinear filtering ESSE scheme run forward until

*t*

_{N}(section 6), a Monte Carlo sample estimate of

**L**

_{k}in (45) is, at

*q*

**M**

_{k}(+) and

**M**

_{k+1}(−) are defined as in section 6. The † logically denotes the Moore–Penrose generalized inverse. Note that in (53),

*q*=

*q*

_{k}(+) =

*q*

_{k+1}(−) so that all multiplications are feasible. Truncating the sample matrices in (53) to their dominant SVD,

**L**

^{p}

_{k}

_{p}[

**M**

_{k}(+)]SVD

_{p}[

**M**

_{k+1}(−)]†. The covariance associated with (46) and (55) can be derived similarly to (47)–(49); one obtains

**P**

^{p}

_{k/N}

**P**

^{p}

_{k}

**L**

^{p}

_{k}

**P**

^{p}

_{k+1/N}

**P**

^{p}

_{k+1}

**L**

^{pT}

_{k}

**Γ**

_{k}≐

**Σ**

_{k}(+)

**V**

^{T}

_{k}(+)

**V**

_{k+1}(−)

**Σ**

^{−1}

_{k+1}(−)

*θ*_{k+1}≐

**E**

^{T}

_{k+1}(−)

**E**

_{k+1/N}

**Π**

_{k}≐

*q*

_{k})

**Σ**

^{2}

_{k}

*k.*Hence, the smoothing update of the ES characteristics consist of

**P**

^{p}

_{k/N}

**E**

_{k/N}

**Π**

_{k/N}

**E**

^{T}

_{k/N}

*ψ̂*^{1}

_{0/N}

*t*

_{N}can be carried out, followed by the statistical smoothing up to

*ψ̂*^{2}

_{0/N}

### c. Discussion

*t*

_{N}. Predictability errors, as well as numerical and analytical model deficiencies can then dominate before reaching

*t*

_{N}. Driving the pure forecast back to the true ocean evolution by quadratic minimization is therefore numerically difficult. This issue could be solved by modifying the original representer equations,

*ψ̂*_{k}

*ψ̂*^{f}

_{k}

**R̃**

_{kl}

**b**

_{l}(Robinson et al. 1998a), into

**R̃**

_{kl}(+) ≐ [

**r**

^{1}

_{kl}(+)

**r**

^{m}

_{kl}(+)

^{n×m}are a posteriori representers and

**b**

_{l}(+) ∈ R

^{m}their coefficients. Equations for

**R̃**

_{kl}(+) and

**b**

_{l}(+) can be derived from the linearization of Tables 6–7. Other advantages of Table 7 include the efficient reduction to the dominant errors;the orthogonality properties of the SVD leading to simplified, efficient computations; and the evaluation of the a posteriori error covariances (58a)–(58b). Table 7 has been successfully utilized for PE process studies in the Levantine Basin, whose first results are partially described in Lermusiaux (1997).

*t*

_{k+1}, a first approach is to set

*ψ̂*_{k/N}

*shooting*method toward future data. It is local,

*ψ̂*_{k/N}

*ψ̂*_{k/k+1}

*t*

_{0},

*t*

_{N}], with an additional term for the model parameters (e.g., initial and boundary conditions). Deriving simple nonlinear subspace methods for iterative adjoint strategies is valuable. Miller and Cornuelle (1999) have successfully utilized such a reduced-state inverse method for initialization. Assuming perfect dynamics and a diagonal data error covariance matrix, the authors employed an ensemble of large-scale horizontal structure functions to adjust the initial conditions to the future data.

## 8. Summary and conclusions

Utilizing basic properties and heuristic characteristics of geophysical measurements and models, a nonlinear assimilation approach has been outlined. The concept of an evolving error subspace (ES), of variable size, that spans and tracks the scales and processes where the dominant errors occur was defined. The word “dominant” is naturally specified by the error measure used in the chosen optimal estimation criterion. Truncating this criterion to its evolving ES defines the error subspace statistical estimation approach (ESSE). The general goal of ESSE is to determine the considered ocean–atmosphere state evolution by minimizing the dominant errors, in accord with the full dynamical and measurement model constraints, and their respective uncertainties. This rational approach, satisfying realistic and practical goals while addressing geophysical issues, leads to efficient estimation schemes. For the minimum error variance criterion, the ES is characterized by time-variant principal error vectors and values. In general, the size and span of the ES evolve in time. The meaning and validity of the dominant truncation of the error space was addressed, with a focus on synoptic/mesoscale to large-scale geophysical applications. The ES was intercompared to variations of variability subspaces and its properties were discussed. In Part II of this study, primitive-equation simulations of Middle Atlantic Bight shelfbreak front evolutions are employed to assess and exemplify the capabilities of an ESSE system. The approach has also been used in real-time forecasting for North Atlantic Treaty Organization (NATO) operations in the Strait of Sicily and in the simulation and study of the spreading of the Levantine intermediate water (Lermusiaux 1997). Other real-time simulations were carried out in the Ionian Sea and Gulf of Cadiz (Robinson et al. 1998b).

In this first part, filtering and smoothing schemes for nonlinear assimilation via ESSE are derived (Figs. 1, 2). The time integration of the ES is based on Monte Carlo ensemble forecasts. The a posteriori members sample the current ES and are forecast using the full nonlinear stochastic model. The melding criterion minimizes variance in the ES and is much less costly than classical analyses involving full covariances. The statistical smoothing keeps all nonlinearities in computing expectations and is carried out from the nonlinear filtering solution, which is already corrected by data. For given computer resources, the representation of errors is energetically optimal. The dynamical forecast is corrected by data where the forecast errors are most energetic. The assimilation is multivariate and three-dimensional in physical space. The model nonlinearities and model errors are accounted for, and their effects on forecast errors explicitly considered. The SVD facilitates the analysis of the evolving error covariance. The scheme is well suited to parallel computers. The formalism, while conceptually simple, can be complex in its details. Determining mathematical systems that describe and track the dominant errors is challenging. Several specific components and variations of the present schemes are provided in appendix B. Dynamical systems for tracking and learning the ES (Brockett 1991) are derived in Lermusiaux (1997).

The concepts introduced by the subspace approach are as useful as the practical benefits. The ESSE formalism defines a theoretical basis for rational intercomparison of other reduced dimension methods. As discussed in the text, these methods have led to numerous controversies and, in fact, several of their respective assumptions were simply shown to be incompatible. By definition, the present approach can rationally validate specific a priori error hypotheses for tailored applications. In accord with the evolution of the deterministic dynamics, model errors, and data available, ESSE may, for instance, lead to different simplified assimilation schemes for weather or ocean forecasting. In dynamical modeling, specific interests lead to verified a priori dynamical assumptions; similarly, in assimilation there are a priori error assumptions to verify. The focus on the dominant errors also fosters the testing and correction of existing dynamical models (e.g., coding mistakes are often associated with a large error growth). It equally implies future observations that span the dominant forecast errors or a search for data optimals (Lermusiaux 1997). ESSE in fact provides a feasible quantitative approach to both dynamical model improvements and adequate field sampling. Finally, aside from the assimilation framework, the present scheme has other applications. Turning off the assimilation in Fig. 2, one can study the impact of the dominant stochastic model errors. This is a new research area and ESSE can validate specific stochastic models for use in simulations. Without model errors and assimilation, the statistical estimation of the variations of variability and stability subspaces is considered. Predictability and stability properties (e.g., Farrell and Ioannou 1996ab) can thus also be decomposed and analyzed. Fixing the estimation time yields an objective analysis scheme. In general, the range of applications includes nonlinear field and error forecasting, data-driven simulations, model improvements, adaptive sampling, and parameter estimation. The accurate tracking and specification of the dominant errors hence appears of paramount importance, even from a fundamental point of view.

## Acknowledgments

We are especially thankful to Dr. Carlos J. Lozano for his continuous interest and helpful collaboration. We thank Mr. Todd Alcock, Mr. Michael Landes, and Mrs. Renate D’Arcangelo for preparing some of the figures and portions of this manuscript. We are grateful to two anonymous referees for their excellent reviews. PFJL is very indebted to Professors Donald G. Anderson, Andrew F. Bennett, Roger W. Brockett, and Brian F. Farrell, members of his dissertation committee, for their challenging guidance and encouragements. PFJL also benefited greatly from several members of the Harvard Oceanography Group, past and present. This study was supported in part by the Office of Naval Research under Grants N00014-95-1-0371 and N00014-97-1-0239 to Harvard University.

## REFERENCES

Anderson, J. L., 1996: Selection of initial conditions for ensemble forecasts in a simple perfect model framework.

*J. Atmos. Sci.,***53,**22–36.Austin, J. W., and C. T. Leondes, 1981: Statistically linearized estimation of reentry trajectories.

*IEEE Trans. Aerosp. Electron. Syst.,***17**(1), 54–61.Bendat, J. S., and A. G. Piersol, 1986:

*Random Data Analysis and Measurement Procedures.*John Wiley and Sons, 566 pp.Bengtsson, L., M. Ghil, and E. Kallen, Eds., 1981:

*Dynamic Meteorology: Data Assimilation Methods.*Springer-Verlag, 330 pp.Bennett, A. F., 1992: Inverse methods in physical oceanography.

*Cambridge Monographs on Mechanics and Applied Mathematics,*Cambridge University Press, 346 pp.——, B. S. Chua, and L. M. Leslie, 1996: Generalized inversion of a global numerical weather prediction model.

*Meteor. Atmos. Phys.,***60**(1–3), 165–178.——, ——, and ——, 1997: Generalized inversion of a global numerical weather prediction model, II: Analysis and implementation.

*Meteor. Atmos. Phys.,***62**(3–4), 129–140.Bergé, P., Y. Pomeau, and C. Vidal, 1988:

*L’Ordre dans le Chaos. Vers une Approche Déterministe de la Turbulence.*Wiley Interscience, 329 pp.Boguslavskij, I. A., 1988:

*Filtering and Control.*Optimization Software, 380 pp.Brockett, R. W., 1991: Dynamical systems that learn subspaces.

*Mathematical Systems Theory: The Influence of R. E. Kalman,*A. Antoulas, Ed., Springer-Verlag, 579–592.Burgers, G., P. J. van Leeuwen, and G. Evensen, 1998: Analysis scheme in the ensemble Kalman filter.

*Mon. Wea. Rev.,***126,**1719–1724.Catlin, D. E., 1989:

*Estimation, Control, and the Discrete Kalman Filter.*Vol. 71,*Applied Mathematical Sciences,*Springer-Verlag, 274 pp.Charney, J. G., and G. R. Flierl, 1981: Oceanic analogues of large-scale atmospheric motions.

*Evolution of Physical Oceanography: Scientific Surveys in Honor of Henry Stommel,*B. Warren and G. Wunsch, Eds., The MIT Press, 504–548.Charnock, H., 1981: Air–sea interaction.

*Evolution of Physical Oceanography: Scientific Surveys in Honor of Henry Stommel,*B. Warren and G. Wunsch, Eds., The MIT Press, 482–503.Cho, Y., V. Shin, M. Oh, and Y. Lee, 1996: Suboptimal continuous filtering based on the decomposition of the observation vector.

*Comput. Math. Appl.,***32**(4), 23–31.Cohn, S. E., 1993: Dynamics of short-term univariate forecast error covariances.

*Mon. Wea. Rev.,***121,**3123–3149.——, and D. F. Parrish, 1991: The behavior of forecast error covariances for a Kalman filter in two dimensions.

*Mon. Wea. Rev.,***119,**1757–1785.Daley, R., 1991:

*Atmospheric Data Analysis.*Cambridge University Press, 457 pp.——, 1992a: The lagged innovation covariance: A performance diagnostic for atmospheric data assimilation.

*Mon. Wea. Rev.,***120,**178–196.——, 1992b: Forecast-error statistics for homogeneous and inhomogeneous observation networks.

*Mon. Wea. Rev.,***120,**627–643.——, 1992c: Estimating model-error covariances for application to atmospheric data assimilation.

*Mon. Wea. Rev.,***120,**1735–1746.Davis, M. H. A., 1977a:

*Linear Estimation and Stochastic Control.*Chapman-Hall, 224 pp.Davis, R. E., 1977b: Techniques for statistical analysis and prediction of geophysical fluid systems.

*Geophys. Astrophys. Fluid Dyn.,***8,**245–277.Dee, D. P., 1990: Simplified adaptive Kalman filtering for large-scale geophysical models.

*Realization and Modelling in System Theory,*M. A. Kaashoek, J. H. van Schuppen, and A. C. M. Ran, Eds.,*Proceedings of the International Symposium MTNS-89,*Vol. 1, Birkhäuser, 567–574.——, S. E. Cohn, A. Dalcher, and M. Ghil, 1985: An efficient algorithm for estimating noise covariances in distributed systems.

*IEEE Trans. Control.,***AC-30,**1057–1065.Ehrendorfer, M., and R. M. Errico, 1995: Mesoscale predictability and the spectrum of optimal perturbations.

*J. Atmos. Sci.,***52,**3475–3500.Errico, R. M., T. E. Rosmond, and J. S. Goerss, 1993: A comparison of analysis and initialization increments in an operational data-assimilation system.

*Mon. Wea. Rev.,***121,**579–588.Evensen, G., 1993: Open boundary conditions for the extended Kalman filter with a quasi-geostrophic ocean model.

*J. Geophys. Res.,***98,**16 529–16 546.——, 1994a: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics.

*J. Geophys. Res.,***99**(C5), 10 143–10 162.——, 1994b: Inverse methods and data assimilation in nonlinear ocean models.

*Physica D,***77,**108–129.——, 1997a: Advanced data assimilation for strongly nonlinear dynamics.

*Mon. Wea. Rev.,***125,**1342–1354.——, 1997b: Application of ensemble integrations for predictability studies and data assimilation.

*Monte Carlo Simulations in Oceanography: Proc. Hawaiian Winter Workshop,*Honolulu, HI, Office of Naval Research and School of Ocean and Earth Science and Technology, University of Hawaii at Manoa, 11–22.——, and P. J. van Leeuwen, 1996: Assimilation of Geosat altimeter data for the Agulhas Current using the ensemble Kalman filter with a quasigeostrophic model.

*Mon. Wea. Rev.,***124,**85–96.Farrell, B. F., and A. M. Moore, 1992: An adjoint method for obtaining the most rapidly growing perturbation to the oceanic flows.

*J. Phys. Oceanogr.,***22,**338–349.——, and P. J. Ioannou, 1996a: Generalized stability theory. Part I: Autonomous operators.

*J. Atmos. Sci.,***53,**2025–2040.——, and ——, 1996b: Generalized stability theory. Part II: Nonautonomous operators.

*J. Atmos. Sci.,***53,**2041–2053.Foias, C., and R. Teman, 1977: Structure of the set of stationary solutions of the Navier–Stokes equations.

*Commun. Pure Appl. Math.,***30,**149–164.——, and ——, 1987: The connection between the Navier–Stokes equations, dynamical systems and turbulence.

*Directions in Partial Differential Equations,*M. G. Grandall, P. H. Rabinowitz, and E. E. L. Turner, Eds., Academic Press, 55–73.Fukumori, I., and P. Malanotte-Rizzoli, 1995: An approximate Kalman filter for ocean data assimilation: An example with one idealized Gulf Stream model.

*J. Geophys. Res.,***100,**6777–6793.——, J. Benveniste, C. Wunsch, and D. B. Haidvogel, 1993: Assimilation of sea surface topography into an ocean circulation model using a steady-state smoother.

*J. Phys. Oceanogr.,***23,**1831–1855.Gamage, N., and W. Blumen, 1993: Comparative analysis of low-level cold fronts: Wavelet, Fourier, and empirical orthogonal function decompositions.

*Mon. Wea. Rev.,***121,**2867–2878.Gelb, A., Ed., 1974:

*Applied Optimal Estimation.*The MIT Press, 374 pp.Ghil, M., 1989: Meteorological data assimilation for oceanographers. Part I: Description and theoretical framework.

*Dyn. Atmos. Oceans,***13**(3–4), 171–218.Hasselmann, K., 1988: PIPs and POPs. A general formalism for the reduction of dynamical systems in terms of principal interaction patterns and principal oscillation patterns.

*J. Geophys. Res.,***93,**11 015–11 021.Horn, R. A., and C. R. Johnson, 1985:

*Matrix Analysis.*Cambridge University Press, 561 pp.——, and ——, 1991:

*Topics in Matrix Analysis.*Cambridge University Press, 607 pp.Jazwinski, A. H., 1970:

*Stochastic Processes and Filtering Theory.*Academic Press, 376 pp.Jiang, S., and M. Ghil, 1993: Dynamical properties of error statistics in a shallow-water model.

*J. Phys. Oceanogr.,***23,**2541–2566.Kolmogorov, A. N., 1941:

*Dokl. Akad. Nauk SSSR,***30,**301;**32,**16.Le Dimet, F. X., and O. Talagrand, 1986: Variational algorithms for analysis and assimilation of meteorological observations.

*Tellus,***38A,**97–110.Lermusiaux, P. F. J., 1997: Error subspace data assimilation methods for ocean field estimation: Theory, validation and applications. Ph.D. thesis, Harvard University, Cambridge, MA, 402 pp.

——, 1999a: Data assimilation via error subspace statistical estimation. Part II: Middle Atlantic Bight shelfbreak front simulations and ESSE validation.

*Mon. Wea. Rev.,***127,**1408–1432.——, 1999b: Estimation and study of mesoscale variability in the Strait of Sicily.

*Dyn. Atmos. Oceans,*in press.Lorenc, A. C., 1986: Analysis methods for numerical weather prediction.

*Quart. J. Roy. Meteor. Soc.,***112,**1177–1194.——, 1992: Iterative analysis using covariance functions and filters.

*Quart. J. Roy. Meteor. Soc.,***118,**569–591.——, R. S. Bell, and B. Macpherson, 1991: The Meteorological Office analysis correction data assimilation scheme.

*Quart. J. Roy. Meteor. Soc.,***117,**59–89.Lorenz, E. N., 1963: Deterministic nonperiodic flow.

*J. Atmos. Sci.,***20,**130–141.——, 1965: A study of the predictability of a 28-variable atmospheric model.

*Tellus,***17,**321–333.Lozano, C. J., A. R. Robinson, H. G. Arango, A. Gangopadhyay, N. Q. Sloan, P. J. Haley, and W. G. Leslie, 1996: An interdisciplinary ocean prediction system: Assimilation strategies and structured data models.

*Modern Appropaches to Data Assimilation in Ocean Modelling,*P. Malanotte-Rizzoli, Ed., Elsevier Oceanography Series, Elsevier Science, 413–432.Martel, F., and C. Wunsch, 1993: Combined inversion of hydrography, current meter data and altimetric elevations for the North Atlantic circulation.

*Manuscripta Geodaetica,***18**(4), 219–226.McWilliams, J. C., W. B. Owens, and B. L. Hua, 1986: An objective analysis of the POLYMODE Local Dynamics Experiment. Part I: General formalism and statistical model selection.

*J. Phys. Oceanogr.,***16,**483–504.Miller, A. J., and B. D. Cornuelle, 1999: Forecasts from fits of frontal fluctuations.

*Dyn. Atmos. Oceans,*in press.Miller, R. N., and M. A. Cane, 1989: A Kalman filter analysis of sea level height in the tropical Pacific.

*J. Phys. Oceanogr.,***19,**773–790.——, M. Ghil, and F. Gauthier, 1994: Data assimilation in strongly nonlinear dynamical systems.

*J. Atmos. Sci.,***51,**1037–1056.——, E. F. Carter, and S. T. Blue, cited 1998: Data assimilation into nonlinear stochastic models. [Available online at http://tangaroa.oce.orst.edu/stochast.html.].

Molteni, F., and T. N. Palmer, 1993: Predictability and finite-time instability of the northern winter circulation.

*Quart. J. Roy. Meteor. Soc.,***119,**269–298.——, R. Buizza, T. N. Palmer, and T. Petroliagis, 1996: The ECMWF ensemble prediction system: Methodology and validation.

*Quart. J. Roy. Meteor. Soc.,***122,**73–119.Monin, A. S., 1974:

*Variability of the Oceans.*Wiley, 241 pp.Moore, A. M., and B. F. Farrell, 1994: Using adjoint models for stability and predictability analysis.

*NATO ASI Ser.,*Vol. 119, 217–239.Mureau, R., F. Molteni, and T. N. Palmer, 1993: Ensemble prediction using dynamically conditioned perturbations.

*Quart. J. Roy. Meteor. Soc.,***119,**299–323.Osborne, A. R., and A. Pastorello, 1993: Simultaneous occurrence of low-dimensional chaos and colored random noise in nonlinear physical systems.

*Phys. Lett. A,***181,**159–171.Parrish, D. F., and S. E. Cohn, 1985: A Kalman filter for a two-dimensional shallow-water model: Formulation and preliminary experiments. Office Note 304, NOAA/NWS/NMC, 64 pp.

Penland, C., 1989: Random forcing and forecasting using principal oscillation pattern analysis.

*Mon. Wea. Rev.,***117,**2165–2185.Phillips, N. A., 1986: The spatial statistics of random geostrophic modes and first-guess errors.

*Tellus,***38A,**314–332.Preisendorfer, R. W., 1988:

*Principal Component Analysis in Meteorology and Oceanography.*Elsevier, 426 pp.Rabier, F., E. Klinker, P. Courtier, and A. Hollingsworth, 1996: Sensitivity of forecast errors to initial conditions.

*Quart. J. Roy. Meteor. Soc.,***122,**121–150.Reid, W. T., 1968: Generalized inverses of differential and integral operators.

*Theory and Applications of Generalized Inverse of Matrices,*T. L. Bouillon and P. L. Odell, Eds., Lublock, 1–25.Robinson, A. R., 1989:

*Progress in Geophysical Fluid Dynamics.*Vol. 26,*Earth-Science Reviews,*Elsevier Science.——, M. A. Spall, L. J. Walstad, and W. G. Leslie, 1989: Data assimilation and dynamical interpolation in gulfcast experiments.

*Dyn. Atmos. Oceans,***13**(3–4), 301–316.——, H. G. Arango, A. J. Miller, A. Warn-Varnas, P.-M. Poulain, and W. G. Leslie, 1996a: Real-time operational forecasting on shipboard of the Iceland–Faeroe frontal variability.

*Bull. Amer. Meteor. Soc.,***77,**243–259.——, ——, A. Warn-Varnas, W. G. Leslie, A. J. Miller, P. J. Haley, and C. J. Lozano, 1996b: Real-time regional forecasting.

*Modern Approaches to Data Assimilation in Ocean Modeling,*P. Malanotte-Rizzoli, Ed., Elsevier Science, 455 pp.——, J. Sellschopp, A. Warn-Varnas, W. G. Leslie, C. J. Lozano, P. J. Haley Jr., L. A. Anderson, and P. F. J. Lermusiaux, 1997: The Atlantic Ionian Stream,

*J. Mar. Syst.,*in press.——, P. F. J. Lermusiaux, and N. Q. Sloan III, 1998a: Data assimilation.

*Processes and Methods,*K. H. Brink and A. R. Robinson, Eds.,*The Sea: The Global Coastal Ocean I,*Vol. 10, John Wiley and Sons.——, and Coauthors, 1998b: The Rapid Response 96, 97 and 98 exercises: The Strait of Sicily, Ionian Sea and Gulf of Cadiz. Harvard Open Ocean Model Rep., Rep. in Meteorology and Oceanography 57, 45 pp. [Available from Harvard Oceanography Group, DEAS, 29 Oxford St., Cambridge, MA 02138.].

Sasaki, Y., 1970: Some basic formalism in numerical variational analysis.

*Mon. Wea. Rev.,***98,**875–883.Schnur, R., G. Schmitz, N. Grieger, and H. von Storch, 1993: Normal modes of the atmosphere as estimated by principal oscillation patterns and derived from quasigeostrophic theory.

*J. Atmos. Sci.,***50,**2386–2400.Sundqvist, H., Ed., 1993: Special issue on adjoint applications in dynamic meteorology.

*Tellus,***45A**(5), 341–569.Tarantola, A., 1987:

*Inverse Problem Theory: Methods for Data Fitting and Model Parameter Estimation.*Elsevier, 613 pp.Teman, R., 1991: Approximation of attractors, large eddy simulations and multiscale methods.

*Proc. Roy. Soc. London,***434A,**23–29.Todling, R., and M. Ghil, 1990: Kalman filtering for a two-layer two-dimensional shallow-water model.

*Proc. WMO Int. Symp. on Assimilation of Observations in Meteorology and Oceanography,*Clermont-Ferrand, France, WMO, 454–459.——, and S. E. Cohn, 1994: Suboptimal schemes for atmospheric data assimilation based on the Kalman filter.

*Mon. Wea. Rev.,***122,**2530–2557.——, and M. Ghil, 1994: Tracking atmospheric instabilities with the Kalman filter. Part I: Methodology and one-layer results.

*Mon. Wea. Rev.,***122,**183–204.Toth, Z., and E. Kalnay, 1993: Ensemble forecasting at NMC: The generation of perturbations.

*Bull. Amer. Meteor. Soc.,***74,**2317–2330.Uchino, E., M. Ohta, and H. Takata, 1993: A new state estimation method for a quantized stochastic sound system based on a generalized statistical linearization.

*J. Sound Vibration,***160**(2), 193–203.van Leeuwen, P. J., and G. Evensen, 1996: Data assimilation and inverse methods in terms of a probabilistic formulation.

*Mon. Wea. Rev.,***124,**2898–2913.von Storch, H., and C. Frankignoul, 1998: Empirical modal decomposition in coastal oceanography.

*Processes and Methods,*K. H. Brink and A. R. Robinson, Eds.,*The Sea: The Global Coastal Ocean I,*Vol. 10, John Wiley and Sons, 419–455.——, I. Bruns, I. Fishcler-Bruns, and K. Hasselmann, 1988: Principal oscillation pattern analysis of the 30- to 60-day oscillation in general circulation model equatorial troposphere.

*J. Geophys. Res.,***93**(D9), 11 022–11 036.Wallace, J. M., C. Smith, and C. S. Bretherton, 1992: Singular value decomposition of wintertime sea surface temperature and 500-mb height anomalies.

*J. Climate,***5,**561–576.Weare, B. C., and J. S. Nasstrom, 1982: Examples of extended empirical orthogonal function analyses.

*Mon. Wea. Rev.,***110,**481–485.West, B. J., and H. J. Mackey, 1991: Geophysical attractors may be only colored noise.

*J. Appl. Phys.,***69**(9), 6747–6749.Wunsch, C., 1988: Transient tracers as a problem in control theory.

*J. Geophys. Res.,***93,**8099–8110.——, 1996:

*The Ocean Circulation Inverse Problem.*Cambridge University Press, 456 pp.

## APPENDIX A

### Generic Assumptions: Stochastic Dynamical and Measurement Models

**∈ R**

*ψ*^{n}and its dynamics is continuous. The observations are made at discrete times

*t*

_{k}, with

*k*= 0, . . . ,

*N,*and contained in

**d**

_{k}∈ R

^{m}. The time lag in between observations is Δ

*t*

_{k}=

*t*

_{k}−

*t*

_{k−1}. The values of the state vector at data times are denoted by

*ψ*_{k}. The sample path of the true ocean state,

*ψ*^{t}, is described by the deterministic evolution,

**d**=

*ψ***f**(

**,**

*ψ**t*)

*dt,*forced by random processes

**dw**∈ R

^{n}:

**d**

*ψ*^{t}

**f**

*ψ*^{t}

*t*

*dt*

**dw**

**d**

_{k}=

**C**

_{k}

*ψ*_{k}, which leads to

**d**

_{k}

**C**

_{k}

*ψ*^{t}

_{k}

**v**

_{k}

**v**

_{k}∈ R

^{m}are the random processes. The Wiener processes

**dw**(

*t*) statistically represent the effect of model errors over

*dt*and

**v**

*represents the measurement noise and measurement model errors at*

_{k}*t*

_{k}. Classic assumptions are made (e.g., Jazwinski 1970; Daley 1991). The forcings

**dw**(

*t*) and

**v**

_{k}have zero mean and respective covariance

**Q**

*t*)

*dt*

**R**

_{k}, with

**dw**(

*t*+

*δ*)

**dw**

^{T}(

*t*)} = 0 ∀

*δ*≠ 0;

**v**

_{k}

**v**

^{T}

_{j}

*k*≠

*j*; and

**dw**(

*t*)

**v**

^{T}

_{k}

*k.*In passing, the probability densities of the random forcings are not formally required to be Gaussian. Following Jazwinski (1970), for a given functional

*g,*the notations

*g*} and

*ĝ*refer to the statistical mean. For an ensemble of size

*q,*the sample mean operator is written

^{q}{·}. The conditional mean state of

*ψ*^{t}

_{k}

*ψ̂*^{t}

_{k}

*ψ̂*_{k}

*t*

_{k}, the white random processes

**dw**(

*t*

_{k}) are uncorrelated to the error, (

*ψ̂*_{k}

*ψ*^{t}

_{k}

*t*

_{k}is defined by

**P**

_{k}≐

*ψ̂*_{k}

*ψ*^{t}

_{k}

*ψ̂*_{k}

*ψ*^{t}

_{k}

^{T}} ∈ R

^{n×n}. To refer to quantities before and after assimilation, the adjectives a priori and a posteriori are used, respectively. In mathematical terms, a (−) and a (+) distinguish the two. For the singular vectors in section 6, the (−) and (+) are simplified to subscripts. A smoothing estimate at

*t*

_{k}is denoted by

*ψ̂*_{k/N}

*k*/

*N*indicates that all observations made up to

*t*

_{N}are used.

State vector augmentation is used in (A1) to describe time-correlated random forcings of the deterministic model. For example, random walks, ramps, or exponentially time-correlated random processes are considered as dynamics such that the enlarged system (A1) is only excited by unbiased white noise (Gelb 1974). External forcings are assumed part of *ψ*^{t} in (A1) since they may evolve with time and feedback between external forcings and internal dynamics exist. Similarly, the boundary conditions, which are nonlinear relations between internal and boundary state variables, have an evolution equation. They are here part of (A1). Finally, parameter estimation is included in (A1) by adding a stochastic evolution equation for each parameter to be estimated. The products of parameters and original state variables then introduce new nonlinearities. To limit the size of the augmented *ψ*^{t} the parameters can be expanded into (local) unknown coefficients functionals (parameter EOFs) instead of gridpoint discretized fields.

## APPENDIX B

### Specifics and Variations of the ESSE Schemes

#### Normalization in multivariate ES

In most cases, ocean and atmosphere models are multivariate. For the ES estimation not to be sensitive to field units, a normalization is needed (Preisendorfer 1988). Field nondimensionalization is not adequate. For instance, salinity in psu and temperature in Celsius have similar orders of magnitude but, relative to temperature variations, small errors in salinity can lead to large errors in velocities. It is not the fields but their errors that need to be comparable. Each sample error field is thus divided by its volume and sample-averaged error variance. Details are in Lermusiaux (1997). The normalization is necessary in the SVDs and multivariate ES convergence criterion (appendix B, section b). For the minimum ES variance update (Table 3), the error singular vectors are redimensionalized prior to computations.

#### Quantitative ES divergence/convergence criteria

*r*≥ 1 new forecasts have been carried out in a parallel batch (Fig. 2), and that the rank-

*p*SVD of the “previous” error sample matrix,

_{p}

**M**

**E**

**Σ**

**V**

^{T}

^{n×p}

*p̃*SVD of the matrix formed of the previous and

*r*new

**M**

^{r}error samples,

_{p̃}([

**M**

**M**

^{r}]) =

**Ẽ**

**Σ̃**

**Ṽ**

^{T}∈ R

^{n×p̃},

**Ẽ**

^{n×p̃},

**Σ̃**

^{p̃×p̃}, and

**Ṽ**

^{T}∈R

^{p̃×p̃}, with

*p̃*≥

*p.*The associated principal error covariances are, respectively,

**P**

^{p}

**E**

**Π**

**E**

^{T}, where

**Π**= (1/

*q*)

**Σ**

^{2}; and

**P**

^{p̃}

**ẼΠ̃Ẽ**

^{T}

**Π̃**

*q*)

**Σ̃**

^{2}

*q̃*=

*q*+

*r*>

*q.*In accord with sections 4 and 5c, the goal is to compute the similarity between the amplitude and structure of these two covariances. One would like to find out how close

**E**

**Π**

^{1/2}

^{n×p}is to

**Ẽ**

**Π̃**

^{1/2}

^{n×p̃}. For coherence with the variance measures (6a)–(6c), a logical similarity coefficient

*ρ*is

*k*= min(

*p̃, p*) and

*σ*

_{i}( · ) selects the singular value number

*i.*If

**Π̃**)

**Π**), the coefficient

*ρ*⩽ 1. The equality holds when

**P**

^{p̃}

**P**

^{p}

*ρ*≃ 1. There are variations of (B3). One can also ensure that the variance and structure of each of the new

*r*forecast can be sufficiently explained by

**E**

**Π**

^{1/2}. There are then

*r*coefficients

*ρ*of form analogous to (B3) to evaluate. Other criteria consist of increasing the ensemble size until the new members yield insignificant reductions in the a posteriori data residuals or insignificant changes in

**(+)**

*ψ̂*#### Error subspace forecast variations

For an efficient account of nonlinear effects, the ensemble method (Table 5) was preferred to integrate the dominant errors. Several alternate error forecasts are discussed next.

*Iterative error breeding* in between DA times generalizes the breeding of perturbations of Toth and Kalnay (1993). This new breeding (Fig. 2) uses the error ensemble forecast to time *t*_{k+1} (Table 5) for iterative improvements of the error initial condition at time *t*_{k}. Once the error ensemble forecast ℓ is made, the simplest approach is to rescale the ES forecast coefficients, **Π**^{ℓ}_{k+1}(−)**Π**^{ℓ+1}_{k}(+)**Π**^{ℓ}_{k}(+)**Π**^{ℓ}_{k+1}(−)**Π**^{ℓ}_{k+1}(−)**E**^{ℓ+1}_{k}(+)**E**^{ℓ}_{k+1}(−)**Π**^{ℓ+1}_{k}(+)**E**^{ℓ+1}_{k}(+)*shooting.* One may shoot for an initial ES that leads to *ψ̂*_{k+1}(+)*t*_{k+1} (Fig. 2). This smoothing technique for determining optimal initial error conditions does not require an adjoint model.

A *tangent linear model* (TLM) and its adjoint can be used to search for the dominant right and left singular error vectors, embracing the classic search for optimal perturbations (e.g., Farrell and Moore 1992; Sundqvist 1993; Errico et al. 1993; Moore and Farrell 1994; Ehrendorfer and Errico 1995; Molteni et al. 1996). Since TLM forecasts are commonly shown to be similar to nonlinear forecasts for a limited duration, TLMs should perhaps only be used to derive local adjoint models. In a search for singular vectors, the nonlinear model would then be run forward and the linear adjoint backward for approximate back integrations. In fact, by nonlinear interactions, the fastest growing singular vectors of the TLMs interact/modify the basic state the most and the fastest. The duration for which a linearly estimated singular vector is reliable decreases proportionally with the vector’s growth rate. Utilizing TLMs in forward computations for DA thus requires care.

The filtering algorithms described by Tables 3–5 are only based on *Monte Carlo nonlinear error forecasts* so as to satisfy our goals (section 3). It is related to but differs from the strict ensemble scheme (Evensen 1994a,b, 1997b; van Leeuwen and Evensen 1996). Its theoretical and practical advantages are now discussed. First, the ES approach brings a framework for validating the ensemble scheme. It permits the quantitative assessment of added value by new forecasts (appendix B, section b). For a given criterion, for example, *ρ* = 98% in (B3), the size of the ES is allowed to evolve with the dynamics and data available (2a)–(2b). In light of the intermittent and burst ocean processes, and of the often eclectic and variable data coverage, this property is important. Second, the central processing unit (CPU) requirements are reduced. For *p* = *q,* the ensemble and ESSE melding lead to the same a posteriori estimates, but in ESSE only one melding is necessary. For *p* < *q,* the present melding occurs in the significant subspace of the sample errors, further reducing computations at least by a factor *q*/*p* (Part II). Third, organizing errors according to their variance allows physical analyses of their dominant components. Such analyses can lead to adequate simplifications of DA schemes. If the errors are of numerical nature, they are usually distinguishable in the dominant ES structures. Algorithms and codes can be fixed; ESSE can be used for model verification. The main advantage of the present statistical smoothing (Tables 6, 7) is the use of the nonlinear filtering evolution as the starting point in the smoothing. The classic ensemble smoother (e.g., Evensen and van Leeuwen 1996) starts from a pure forecast, which increases the potential of divergence. Finally, the present schemes open the doors toward DA based on subspace trackers (Lermusiaux 1997).

In *real-time and management operations,* it might not be necessary to forecast the ES continuously. A stationary, historical, or climatological ES could suffice in specific conditions. If the dynamics and data statistics (2a)–(2b) are stationary, all possible dominant error vectors for the region studied can be evaluated in advance and stored, just as one can store the classic vertical EOFs of a region. Only the principal error values are then forecast. From experience, principal error value models could be derived. For instance, one may assume exponential growth in between each assimilation. Another practical method is to forecast the ES for a central assimilation time and to use it at other times for the time ramping of observations. *Analytical* ES can also be defined and projected onto the most dominant variability subspace of a given ocean state by a priori ensemble runs.

#### Criterion for “best” forecast selection

Even when the probability density forecast associated with (2a)–(2b) is available, the question of what should be the estimate of the geophysical state is still primordial, especially in practice (e.g., Robinson et al. 1989). Each sensible choice corresponds to a criterion defining the “best” or optimal forecast. Several such criteria are discussed next. The simple theoretical concepts presented are linked to the current schemes and illustrated with Gulf Stream scenarios.

A good estimate should obviously have small expected forecast errors. The state that is closest to the truth (2a)–(2b), in the sense of any convex loss function or measure^{6} of the expected error, is the conditional mean (34). Here its logical statistical estimate is the *ensemble mean* *ψ̂*^{em}_{k+1}(−)*q**ψ̂*^{em}_{k+1}(−)*ψ̂*^{j}_{k+1}(−)

Other options considered here are (Fig. 2) the central forecast (35), most probable forecast, and forecast of minimum data misfits. The nonlinear *central forecast* *ψ̂*^{cf}_{k+1}(−)*ψ̂*^{cf}_{k+1}(−)*ψ̂*^{em}_{k+1}(−)*most probable forecast* can also be estimated from the ensemble. A probability density (histogram) is first computed for each of the *n* state variables (i.e., elements of ** ψ**). The range of values taken by the

*q*members is divided into a given number of intervals and the members assigned to their respective interval, forming the histogram. For each variable, the most probable value is the center of the tallest segment or bar. The most probable forecast

*ψ̂*^{mp}

_{k+1}(−)

*ψ̂*^{j}

_{k+1}(−)

*ψ̂*^{mp}

_{k+1}(−)

*ψ̂*^{em}

_{k+1}(−)

*ψ̂*^{mp}

_{k+1}(−)

*ψ̂*^{mp}

_{k+1}(−)

*ψ̂*^{mp}

_{k+1}(−)

*local*conditional mean estimation. If the subset is not selected,

*ψ̂*^{mp}

_{k+1}(−)

*forecast of minimum data misfits*can be chosen among the stochastic ensemble. This is, in a sense, an adjoint approach in the data domain: instead of neglecting model errors, the more frequent small data errors are neglected. However, as for

*ψ̂*^{mp}

_{k+1}(−)

#### Algorithmic issues

The matrix **C̃**^{p}**CE**_{−} is evaluated so that **E**_{−}**Π**(−)**E**^{T} is never formed. The ES covariance mapped onto the data space, **C̃**^{p}**Π**(−)**C̃**^{pT}**R****R****C̃**^{p}**CE**_{−} are evaluated via pointers. Only non-null elements are retained. The size of the ES evolves in function of (B3). When *ρ* is larger than a threshold, iterations are stopped. Parallel computing is used, depending on availability. The cost of the ESSE system shown in Fig. 2 is often driven by the ensemble size. In terms of the number of forecasts made, this cost is of order *q.* The OI is then of order 1 and the full covariance scheme of order *n.* The representer method rescaled to filtering is of order 2*m*/2 = *m,* where *m* is the total number of scalar observations made during the assimilation period. Typical numbers for a 10-day period and three assimilations of 50 CTDs on 20 levels are *n* = 3.10^{5}, *q* = 300, and *m* = 6000. Assuming that a 1-day forecast is issued in 30 min and that 30 CPUs are available, the 10-day period ESSE takes 50 h. The full covariance scheme takes almost 6 yr, an iteration of the representer method of almost 42 days. Additional computational aspects are discussed in Part II and in Lermusiaux (1997).

Filtering via ESSE at *t _{k}*: Continuous–discrete problem statement.

Eigendecomposition of the minimum error variance linear update (index *k* omitted).

Minimum sample ES variance linear update (index *k* omitted).

SVD of the ensemble spread linear update (index *k* omitted).

Nonlinear dynamical state and ES ensemble forecast.

Smoothing via statistical linearization.

ESSE smoothing via statistical linearization.

^{1}

From among the terms used in the literature, “measurement model” was preferred. Observation model, measurement relation, or functional are also used (Gelb 1974; Catlin 1989; Daley 1991; Bennett 1992).

^{2}

In the present study, the term energy can also refer to a pseudoenergy or field squared amplitude.

^{3}

Relations between the dominant eigenvectors of **Q***t*)

^{4}

In this text, the term covariance relates to a matrix quantity. The covariances are dimensional but all eigendecompositions or SVDs are made on nondimensional fields so that the ordering of eigen- or singular values is unit independent. To simplify notations, the normalization is presented in appendix B, section a.

^{5}

For coherence with other works (e.g., Burgers et al. 1998), (19a,b) differ from the derivation of Lermusiaux (1997) which used true error sample matrices. The present scheme (A) still leads to the same results since the variance of the error incurred for using an estimate of the truth in (19a)–(19b) instead of the truth itself is of *q*)

^{6}

For simplicity, the class of loss functions defined in Jazwinski (1970, 146–150) is here referred to as convex measures. This terminology is, in fact, a subclass, since the measure does not need to be convex.