## 1. Introduction

Data assimilation as a means of constructing the initial conditions for dynamical prediction models in meteorology has 50+ yr of history. It began in the late 1940s–early 1950s as a response to anticipation of numerical weather prediction (NWP) that began in a research mode at Princeton’s Institute for Advanced Study (IAS) in 1946 [reviewed in Lynch (2006)]. By the mid-1950s, operational NWP commenced in Sweden and shortly thereafter in the United States (Wiin-Nielsen 1991). The first operational numerical weather map analysis or objective analysis as it was then called came from the work of Bergthórsson and Döös (1955)—the B–D scheme.

The pragmatic and utilitarian B–D scheme established the following guidelines that became central to development of meteorological data assimilation: 1) use of a background field that, in their case, was a combination of a forecast from an earlier time (12 h earlier) and climatology; and 2) interpolation of an “increment” field, the difference between the forecast and observation at the site of the observation, to grid points as a means of adjusting the background. Two optimal approaches to data assimilation came in the wake of the B–D scheme. The first was a stochastic method designed by Eliassen (1954) with further development and operational implementation by Gandin (1965) at the National Meteorological Center (NMC), United States [reviewed in Bergman (1979)]. The second was a deterministic scheme developed by Sasaki (1958, 1970a,b,c) with operational implementation by Lewis (1972) at the U.S. Navy’s Fleet Numerical Weather Center (FNWC). The subsequent advancement of these two approaches became known as three-dimensional variational data assimilation (3DVAR) and four-dimensional variational data assimilation (4DVAR), respectively. A comprehensive review of the steps that led to these developments is found in the historical paper by Lewis and Lakshmivarahan (2008). As currently practiced, both 3DVAR and 4DVAR make use of a background, a forecast from an earlier time, and thereby embrace a Bayesian philosophy (Kalnay 2003; Lewis et al. 2006).

The subject of automatic control and feedback control in particular came into prominence in the immediate post–World War II (WWII) period (Wiener 1948) when digital computers became available and control of ballistic objects such as missiles and rockets took center stage in the Cold War era (Bennett 1996; Bryson 1996). Development of mathematical algorithms to optimally track rockets and artificial satellites and to efficiently and economically change their course became a fundamental theme in control theory. One of the algorithms developed during this period became known as Pontryagin’s minimum principle (PMP) (Pontryagin et al. 1962; Boltyanskii 1971, 1978; Bryson 1996, 1999). This principle, developed by Lev Pontryagin and his collaborators, is expressed in the following form: In the presence of dynamic constraints (typically differential equations of motion), find the best possible control for taking a dynamic system from one state to another. Essentially, this principle embodies the search for minimization of a cost function that contains the Euler–Lagrange search for the minimum. As will be shown in section 3, 4DVAR is a special case of PMP. We will test this methodology and concept in meteorological problems where the task will be to force the system toward observations in much the same spirit as the nudging method (Anthes 1974)—but importantly, in this case, the process is optimal (Lakshmivarahan and Lewis 2013).

In this paper we succinctly review the basis for the PMP as it applies to the determination of the optimal control/forcing by minimizing a performance functional that is a sum to two quadratic forms representing two types of energy where the given model is used as a strong constraint. The first term of this performance functional is the total energy of the error, the difference between the observations (representing truth), and model trajectory starting from an arbitrary initial condition. Minimization of this energy term has been the basis for the variational methods (Lewis et al. 2006). The second quadratic form represents the total energy in the control signal. It must be emphasized that the use of least energy to accomplish a goal is central to engineering design and distinguishes this approach from the traditional variational approaches to dynamic data assimilation. A family of optimal controls can be achieved by giving different weights to these two energy terms.

By introducing an appropriate Hamiltonian function, this approach based on PMP reduces the well-known second-order Euler–Lagrange equation to a system of two first-order canonical Hamiltonian equations, the like of which have guided countless developments in physics (Goldstein 1950, 1980). While Kuhn and Tucker (1951) extended the Lagrangian technique for equality constraints to include inequality constraints by developing the theory of nonlinear programming for static problems, Pontryagin et al. (1962) used this Hamiltonian formulation to extend the classical Euler–Lagrange formulation in the calculus of variations. This extension has been the basis for significant development of optimal control theory in the dynamical setting. The resulting theory is so general that it can handle both equality and inequality constraints on both the state and the control. Further, there is a close relationship between the PMP and Kuhn–Tucker condition. See Canon et al. (1970) for details.

Recall that the optimal control computed using the PMP forces the model trajectory toward the observations. Hence, it is natural to interpret this optimal control as the additive optimal model error correction. In an effort to further understand the impact of knowing this optimal sequence of model errors, we take PMP one step further. Given an erroneous linear model with

While the PMP approach to dynamic data assimilation in meteorology is new, there are conceptual and methodological similarities between this approach and the vast literature devoted to analysis of model errors. We explore some of the similarities. The contributions in the area of model error correction are broadly classified along two lines—deterministic or stochastic model and the model constraint that is strong or weak.

In a stimulating paper, Derber (1989) first postulates that the deterministic model error can be expressed as the product of an unknown temporally fixed spatial function *φ* and a prespecified time-varying function. Using the model as a strong constraint, he then extends the 4DVAR method to estimate *φ* along with the initial conditions which to our knowledge represents the first attempt to quantify the model errors using the variational framework. Griffith and Nichols (2001) again postulate that the model error evolves according to an auxiliary model with unknown parameters. By augmenting this empirical secondary model with the given model, they then estimate both the initial condition and the parameters using the 4DVAR, using the model as a strong constraint. The PMP-based approach advocated in this paper does not rely on empirical auxiliary models.

It is also appropriate to briefly mention the earlier efforts in control theory and meteorology to account for model error. See Rouch et al. (1965), Friedland (1969), Bennett (1992), Bennett and Thorburn (1992), and Dee and da Silva (1998). In the spirit of these contributions, the work by Menard and Daley (1996) made the first attempt to relate Kalman smoothing to PMP. The primary difference between our approach and the Menard and Daley (1996) approach is that we consider a deterministic strong constraint model with time-varying errors while they develop a weak constraint stochastic model formulation with stochastic error terms with known covariance structure. Zupanski’s (1997) discussion of advantages with the weak constraint formulation of the 4DVAR to assess systematic and random model errors is a meaningful complement to Menard and Daley (1996).

In section 2 we provide a robust derivation of the PMP for the general case of (autonomous) nonlinear model where observations are (autonomous) nonlinear functions of the state. The computation of the optimal control sequence in this general case reduces to solving a nonlinear two-point boundary-value problem (TPBVP). We then specialize these results for the case when both the model and observations are linear in section 3. In this case of linear dynamics, the well-known sweep method (Kalman 1963) is used to reduce the TPBVP to solve two initial-value problems. To illustrate the power of the PMP we have chosen the linear Burgers equation where the advection velocity is a sinusoidal function of the space variable—this linear model has many of the characteristics of Platzman’s (1964) classic study of Burgers’s nonlinear advection. Many of the key properties of this linear Burgers equation and its *n*-mode spectral counterpart [also known as the low-order model LOM(*n*)] obtained by using the standard Galerkin projection method (Shen et al. 2011) are described in section 4. Numerical experiments relating to the optimal control of LOM(4) are given in section 5. In a series of interesting papers, Majda and Timofeyev (2000, 2002) and Abramov et al. (2003) analyze the statistical properties of the solution of the *n*-mode spectral approximation to the nonlinear Burgers equation. Section 6 illustrates the computation of the consolidated correction matrix using the computed time series of optimal controls and the associated optimal trajectory. It is demonstrated that the uncontrolled solution of the corrected model (

## 2. Minimum principle in discrete form

### a. Stepwise solution of the variational problem

*η*_{k}∈

^{n}is the given intrinsic physical forcing that is part of the model.

*p*≤

*n*,

^{n×p}, and

**u**

_{k}∈

^{p}is the new control or decision vector. As an example, when

*p*= 1 and

^{T}∈

^{n}, then the same (scalar) control

*u*is applied to each and every component of the state vector. At the other extreme, when

_{k}*p*=

*n*and

_{n}, the identity matrix of order

*n*, then

**u**

_{k}∈

^{n}and the

*i*th component of

**u**

_{k}is applied to the

*i*th component of the state vector. It is assumed that the initial condition

*x*is specified. Letwhere

_{o}**z**

_{k}∈

^{m}for some positive integer

*m*denotes the observation vector at time

*k*,

**h**:

^{n}→

^{m}denotes the map (also known as the forward operator) that relates the model state

**x**

_{k}to the observation

**z**

_{k}, and

**v**

_{k}is the observation noise vector, which is assumed to be white and Gaussian. That is,

**v**

_{k}~

*N*(0,

_{k}), where

_{k}∈

^{m×m}is a known positive definite matrix.

*N*is the number of observations, the cost functional

*V*is a sum of two terms given bywithThe notation 〈

_{k}*a*,

*b*〉 indicates the standard inner product, and

^{p×p}is a given symmetric and positive definite matrix. Clearly

^{−1}, one can obtain a variety of tradeoffs between these two energy terms by appropriately choosing the matrix

*L*, obtained by augmenting the dynamical constraint in (2.1) with

*J*in (2.4), as follows:where

*λ*_{k}∈

^{n}for 1 ≤

*k*≤

*N*denotes the set of

*N*undetermined Lagrangian multipliers or the costate variables. Now define the associated Hamiltonian functionSubstituting (2.10) in (2.9), the latter becomesBy splitting the summation on the right-hand side of (2.11), we obtain

*η*_{k}is specified, no variation of

*η*_{k}is considered. Let

*δL*be the induced increment in

*L*resulting from the increments

*δ*

**x**

_{k}in

**x**

_{k}and

*δ*

**u**

_{k}in

**u**

_{k}for 0 ≤

*k*≤

*N*− 1 and

*δ*

*λ*_{k}in

*λ*_{k}for 0 ≤

*k*≤

*N*. Since

*H*is a scalar valued function of the vectors

_{k}**x**

_{k},

**u**

_{k},

*η*_{k}, and

*λ*_{k+1}, from the first principles (Lewis et al. 2006) we obtainwhere

**∇**

_{x}

*H*∈

_{k}^{n},

**∇**

_{u}

*H*∈

_{k}^{p}, and

**∇**

_{λ}

*H*∈

_{k}^{n}are the gradients of

*H*with respect to

_{k}**x**

_{k},

**u**

_{k}, and

*λ*_{k+1}, respectively.

Recall that *δL* must be zero at the minimum, and in view of the arbitrariness of *δ***x**_{k}, *δ***u**_{k}, and *δ**λ*_{k}, we readily obtain a set of necessary conditions expressed as follows, all for 0 ≤ *k* ≤ *N* − 1.

#### 1) Condition 1: Model dynamics

*H*in (2.10) with respect to

_{k}

*λ*_{k}and substituting it in (2.14), the latter becomeswhich in fact turns out to be the model equations given in (2.2). Stated in other words, Pontryagin’s method dictates that the sequence of states

**x**

_{k}arise as a solution of the model used as a strong constraint.

#### 2) Condition 2: Costate or adjoint dynamics

*H*in (2.10) with respect to the model state

_{k}**x**

_{k}and using it in (2.16), the latter becomeswhere

**∇**

_{x}

*V*is the gradient of

_{k}*V*in (2.5) given bywhich is the normalized forecast error viewed from the model space,is the Jacobian of the forward operator

_{k}**h**(

**x**) with respect to the model state, andis the Jacobian of the model map

#### 3) Condition 3: Stationarity condition

*H*in (2.10) with respect to the control

_{k}**u**

_{k}and using it in (2.22), the latter becomesFrom (2.5) to (2.7) we get the gradient of

*V*with respect to

_{k}**u**

_{k},and from (2.2) we getwhich is the Jacobian of the model in (2.2) with respect to the control

**u**

_{k}.

**x**

_{0}is given but

**x**

_{N}is free. Hence

*δ*

**x**

_{0}= 0 and

*δ*

**x**

_{N}is arbitrary. Thus, the first term on the right-hand side of (2.13) is automatically zero. The third term is zero by forcingThe above analysis naturally leads to a framework for optimal control, which is stated below.

##### (i) Step 1: Compute the optimal control

The structure of the optimal control sequence **u**_{k} is computed by solving the stationarity condition (2.22) and is given by (2.26).

##### (ii) Step 2: Solve the nonlinear TPBVP

**x**

_{0}for (2.28) is given and the final condition

*λ*_{N}= 0 is given for (2.29). Clearly, the solution (2.28) and (2.29) gives the optimal trajectory. A number of observations are in order.

The importance of the Hamiltonian formulation of the Euler–Lagrange necessary condition for the minimum stems from the simplicity and conciseness of the two first-order equations (2.14) and (2.16) involving the state and the costate/adjoint variables. This Hamiltonian formulation has been the basis of countless developments in physics (Goldstein 1980).

##### b. Computation of optimal control

Equation (2.28), a representation of the model dynamics, is solved forward in time starting from the known initial condition **x**_{0}. But the adjoint (2.29), representing the costate/adjoint dynamics, is solved backward in time starting from *λ*_{N} = 0. The two systems in (2.28) and (2.29) form a nonlinear coupled two-point boundary value problem, which in general does not admit to closed form solution. A number of numerical methods for solving (2.28) and (2.29) have been developed in the literature, a sampling of which is found in Roberts and Shipman (1972), Keller (1976), Polak (1997), and Bryson (1999). A closed form solution to the optimal control problem exists for the special case when the model dynamics is linear and the cost function *V _{k}* is a quadratic form in state

**x**

_{k}and control

**u**

_{k}. This special case is covered in section 3 of this paper.

##### c. Connection to 4DVAR

#### 1) Condition 1A: Model dynamics

#### 2) Condition 2A: Costate/adjoint dynamics

*δ*

**x**

_{k}is arbitrary, vanishing of the fourth term on the right-hand side of (2.34) giveswhich in the light of (2.33) becomes the costate dynamics given bywhere

**x**

_{N}is free,

*δ*

**x**

_{N}is arbitrary. Hence, vanishing of the second term in (2.33) requiresCombining these, we readily see that (2.34) reduces toSo, from first principles and using (2.33), it follows thatThe above development naturally leads to the standard 4DVAR algorithm (Lewis et al. 2006, 411–412), which is summarized below:

## 3. Optimal tracking: Linear dynamics

In this section we apply the minimum principle developed in section 2 to solve the problem of finding explicit form for optimal control or forcing that will drive the dynamics to track or follow the given set of observations when the model is linear and the performance measure is a quadratic function of the state and the control (Kalman 1963; Catlin 1989).

^{n×n},

*η*_{k}∈

^{n}is the intrinsic forcing term that is part of the model and

^{n×p}, which is the special case of (2.1). Let the observations be given bywhere

^{m×n}and

**v**

_{k}~

*N*(0,

^{m×m}is the known positive definite matrix denoting the covariance of

**v**

_{k}.

- Structure of optimal control. From the stationarity condition developed in (2.22)–(2.26), it readily follows that the structure of the optimal control in this linear case is given by which is the same as in the nonlinear case treated in section 2.
- The linear two-point boundary value problem. Substituting the special form of the dynamics and the observation given by (3.1)–(3.3) and the expression for
**u**_{k}given by (3.4) in (2.28) and (2.29), the latter pair of equations becomewhere we have used the fact that**h**(**x**) =**x**and**D**(_{x}**h**) =. The initial condition for (3.5) is the given **x**_{0}and the final condition for (3.6) is*λ*_{N}= 0. Again, recall that (3.6) is the well-known adjoint equation that routinely arises in the 4DVAR analysis (Lewis et al. 2006, 408–412). For later reference we rewrite (3.5) and (3.6) aswhere= ^{−1}^{T},= ^{T}^{−1}, and = ^{T}^{−1}.

It turns out this special linear TPBVP can be transformed into a pair of initial value problems using the sweep method, which in turn can be easily solved. By exploiting the structure of (3.5) and (3.6), it can be verified (see appendix A for details) that *λ*_{k} is an affine function of the state **x**_{k}.

*λ*_{k}is related to

**x**

_{k}via a general affine transformation of the type

**x**

_{k}, we immediately obtain equations that define the evolution of the matrix

_{k}and the vector

**g**

_{k}as follows:which is a nonlinear matrix Riccati equation andSince

*λ*_{N}= 0 and

**x**

_{N}is arbitrary, from (3.8) it is immediately clear thatAgainst this backdrop, we now state the algorithm for computing the optimal control and the optimal trajectory.

- Step 1Given (3.1)–(3.3), compute
= ^{−1}^{T},= ^{T}^{−1}, and = ^{T}^{−1}. Solve the nonlinear matrix Ricatti difference equation in (3.11) for_{k}backward starting at_{N}= 0. Since this computation is independent of the observations, it can be precomputed and stored if needed. - Step 2Solve the linear vector difference equation in (3.12) for
**g**_{k}backward in time starting from**g**_{N}= 0. Notice that**g**_{k}depends on the observations and the intrinsic forcing*η*_{k}that is part of the given model. It will be seen that the impact of the observations on the optimal control is through**g**_{k}. - Step 3Once
_{k}and**g**_{k}are known, we can compose the optimal control using (3.4) and (3.8):Using (3.1) in (3.14), the latter becomesPremultiplying both sides by*C*and simplifying, we get an explicit expression for the optimal control aswhere the feedback gain_{k}is given byand the feedforward gain_{k}is given byFrom (3.1) and (3.16), the optimal trajectory is then given byor as

**u**

_{k}and it plays a dual role. First, it forces the model trajectory toward the observations where the measure of closeness depends on the choice of

*p,*the dimension of the control vector

**u**

_{k}, the matrix

^{n×p}, and the matrix

^{p×p}, where it is assumed that the observational error covariance matrix

**u**

_{k}contains information about the model error. Thus, for a given value of

**u**

_{k}to achieve this goal by suitably varying the integer

*p*, (1 ≤

*p*≤

*n*), and

^{n×p}and

^{p×p}with

## 4. Dynamical constraint: Linear Burgers’s equation

To illustrate Pontryagin’s method, we choose a dynamic constraint that follows the theme of Platzman’s classical study of Burgers’s equation (Platzman 1964). In that study, Platzman investigated the evolution of an initial single primary sine wave over the interval [0, 2*π*]. The governing dynamics described the transfer of energy from this primary wave to waves of higher wavenumber as the wave neared the breaking point. In a tour de force with spectral dynamics, Platzman obtained a closed form solution for the Fourier amplitudes and then analyzed the consequences of truncated spectral expansions. The contribution was instrumental in helping dynamic meteorologists understand the penalties associated with truncated spectral weather forecasting in the early days of numerical weather prediction.

We maintain the spirit of Platzman’s investigation but in a somewhat simplified form. Whereas the nonlinear dynamic law advects the wave with the full spectrum of Fourier components, we choose to advect with only the initial primary wave—sin(*x*). This problem retains the transfer of energy from the primary wave to the higher wavenumber components as the wave steepens, but the more complex phenomenon of folding over or breaking of the wave is absent in this linear problem.

### a. Model and its analytic solution

**q**(

*x*, 0) = sin(

*x*) and boundary conditions

**q**(0,

*t*) =

**q**(2

*π*,

*t*) = 0. Following Platzman (1964), we seek a solution to (4.1) by the method of characteristics (Carrier and Pearson 1976). The characteristics of (4.1) are given bywhere

*x*

_{0}is the intersection of a particular characteristic curve with the line of initial time (

*t*= 0). Using the mathematical expression for the characteristics in concert with the initial condition, the analytic solution is found to beFrom this analytic solution, the profiles of the wave at times

*t*= 0, 0.5, 1.0, 1.5, and 2.0 are shown in Fig. 1. The slope of the wave is finite at

*x*=

*π*but approaches ∞ as

*t*→ ∞.

*k*th Fourier coefficient of the solution

**q**(

*x*,

*t*) in (4.3) given byDefine the vectorof the first

*n*Fourier coefficients of

**q**(

*x*,

*t*). The values of the coefficients

*k*≤

*n*= 8 and 0 ≤

*t*≤ 2.0 in steps of Δ

*t*= 0.2 are given in (rows of) Table 1.

Values of the first eight Fourier coefficients of **q**(**x**, *t*) in (4.3) for various times computed using quadrature formula.

*n*-mode approximation [resulting from spectral truncation to

**q**(

*x*,

*t*)] is then given byA comparison of the exact solution

**q**(

*x*,

*t*) with the four-mode approximation

*n*= 4 and 8, respectively, at

*t*= 2.0 is given in Fig. 2. As to be expected, the eight-mode approximation is closer to the true solution than is the four-mode approximation. Further, the errors are the greatest at the point of extreme steepness of waves.

### b. The low-order model

In demonstrating the power of Pontryagin’s method developed in sections 2 and 3, our immediate goal is to obtain a discrete time model representative of (3.1). There are at least two ways, in principle, to achieve this goal. The first way is to directly discretize (4.1) by embedding a grid in the two-dimensional domain with 0 ≤ *x* ≤ 2*π* and *t* ≥ 0. Second is to project the infinite dimensional system in (4.1) onto a finite dimensional space using the standard Galerkin projection method and obtain a system of *n* ordinary differential equations (ODEs) describing the evolution of the Fourier amplitudes **q**_{i}(*t*) in (4.4), 1 ≤ *i* ≤ *n*. The resulting *n*th-order system is known as the low-order model (LOM). Then LOM can be discretized using one of several known methods. In this paper we embrace this latter approach using LOM.

**q**(

*x*,

*t*) consists of an infinite series of sine waves given by

*n*) describing evolution of the amplitudes of the spectral components are obtained by exploiting the orthogonality properties of the sin(

*ix*) functions for 1 ≤

*i*≤

*n*. Substituting (4.4) into (4.1), multiplying both sides by sin(

*ix*), and integrating both sides from 0 to 2

*π*, we obtain the LOM(

*n*) (also known as the spectral model):whereas its initial condition and the matrix

*a*= −(

_{i}*i*− 1),

*c*= (

_{i}*i*+ 1). An example for

*n*= 4 is given by

We now state a number of interesting properties of the solution of the LOM(*n*) in (4.8).

#### 1) Conservation of energy

*E*(

**q**) representing generalized energy and given bywhereis a diagonal matrix with the indicated entries as its diagonal elements. It can be verified that the time derivative of

*E*(

**q**) evaluated along the solution of (4.8) is given byFrom the form of

*s*=

_{i}*i*(

*i*+ 1) for 1 ≤

*i*≤

*n*− 1. Hence, it can be verified that the quadratic form

**q**

^{T}

*q*is zero, which in turn implies that the energy

*E*(

**q**) is conserved by (4.8); that is,An immediate consequence of (4.16) is that the solution

**q**(

*t*) of (4.8) always lies on the surface of an

*n*-dimensional ellipsoid defined byClearly, the length of the

*k*th semiaxis of this ellipsoid is given by

*n*! =

*O*(2

^{nlogn}), it turns out that the volume of this ellipsoid goes to zero at an exponential rate as

*n*increases signaling degeneracy for large

*n*.

#### 2) Solution of LOM(*n*) in (4.8)

Much like the PDE (4.1), its LOM(*n*) counterpart in (4.8) can also be solved exactly. The process of obtaining its solution is quite involved. To minimize the digression from the main development, we have chosen to describe this solution process in appendix B. The eigenstructure of *t*—the time discretization interval to be used in the following section.

## 5. Numerical experiments

*n*= 4 using the first-order Euler scheme, we obtainwhere

*ξ*_{k}=

**q**(

*t*=

*k*Δ

*t*) and Δ

*t*denotes the length of the time interval used in time discretization andwhere

^{4×4}is given in (4.11) and the initial condition in (4.9).

**u**

_{k}∈

^{p}and

^{n×p}. Clearly (5.3) is the same as (3.1) with

*η*_{k}≡ 0 and

**x**

_{0}=

**q**(0).

Equation (5.3) defines the evolution of the spectral amplitudes. Compared to the original equation, the spectral model in (5.3) has two types of model errors: one from the spectral truncation in the Galerkin projection and one due to finite differencing in (4.8) using the first-order method.

### Observations

*t*=

*k*Δ

*t*in (4.5) corrupted by additive noise as the observations in our numerical experiments. Definewhere

**z**

_{k}∈

^{n},

*ν*_{k}~

*N*(0, R), and

Comparing (5.4) with (3.2), it is immediate that *m* = *n* and _{n}.

*V*is given bywhere

_{k}^{p×p}is a symmetric and positive definite matrix.

^{−1}

^{T},

^{−1},

^{−1},

_{N}= 0, and

**g**

_{N}= 0.

Solving (5.9) and (5.10), we then assemble **u**_{k} using (3.14)–(3.18). Substituting it in (5.1) we get the optimal solution.

#### 1) Experiment 1

*n*=

*m*=

*p*= 4,

_{4}, and

**u**

_{k}∈

^{4}. The uncontrolled model isand the controlled model iswith

*t*

Both models start from the same initial condition *ξ*_{0} = **x**_{0} = (1.1, 0, 0, 0)^{T}, which is different from the one that was used to generate the observations. Consequently, the solution to the unforced model in (5.11) inherits three types of errors: the first because of the spectral truncation, the second because of finite differencing, and the third owing to error in the initial condition. The power of the Pontryagin’s approach is to compute the optimal control **u**_{k} such that the term **u**_{k} compensates for all the three types of errors.

**z**

_{k}∈

^{4}is given byfor 1 ≤

*k*≤ 10, where

*ν*_{k}~

*N*(0,

*c*

_{p}. Substituting these in the expression for

*V*in (2.5)–(2.7), it can be verified that

_{k}A comparison of the evolution of the four components of the uncontrolled error, *e*_{0} = *ξ*_{k} − **z**_{k} ∈ ^{4}, and the corresponding components of the controlled error, *e _{c}* =

*ξ*_{k}−

**z**

_{k}∈

^{4}, when

*σ*

^{2}= 0.001 fixed but

*c*is varied through 10

^{5}, 10

^{3}, and 1, are given in Figs. 3a–d. It is clear that the magnitudes of the individual components of the controlled error are uniformly (in time

*k*) less than the magnitudes of the corresponding components of the uncontrolled error. Further, the magnitudes of the controlled error decrease with the decrease in the value of the control parameter

*c*.

This behavior can be easily explained using (5.14). When the value of the control parameter *c* is large (for a fixed ^{−1}), the minimization process forces **u**_{k} to be small. However, if *c* is small, the minimization allows for larger value of **u**_{k}. This increased forcing helps to move **x**_{k} in such a way that **x**_{k} is closer to **z**_{k}. This observation is corroborated by the plots of the evolution of the four components of the control {**u**_{k}} given in Figs. 4a–d. It is evident from Fig. 4 that the magnitude of the initial values of the control increases as the parameter *c* is decreased.

*j*th component of the controlled and uncontrolled model solution with the

*j*th component of the observations are given byandTable 2 gives the values of these measures for various combinations of the values of

*c*. It is clear from Fig. 3 and Table 2 that for a given

_{1}decreases as

*c*decreases.

Root-mean-square errors of the controlled and uncontrolled model solution with observations (*p* = 4, _{4}).

#### 2) Experiment 2

In this experiment we set *p* = 1 and ^{T} and all the other parameters are the same as in experiment 1. A comparison of the plots of the observations with controlled and uncontrolled model solution is given in Fig. 5. Table 3 provides a comparison of the RMS errors for various choices of *c*. Recall that when *p* = 1, the same control is applied to every component of the state vector as opposed to when *p* = 4 where different elements of the control vector impact the evolution of the different components of the state vector. Thus in experiment 1 (*p* = 4) the components of the control vector are customized to each component of the state vector and hence the errors are less as borne by comparing the corresponding elements of Tables 2 and 3. Clearly, larger *p* is better.

Root-mean-square errors of the controlled and uncontrolled model solution with observations [*p* = 1, ^{T}].

## 6. Identification of model errors

One of the lofty goals of dynamic data assimilation is to find a correction for model error—errors due to the absence or inappropriate parameterization of physical processes germane to the phenomenon under investigation, and/or incorrect specification of the deterministic model’s control vector (initial conditions, boundary conditions, and physical/empirical parameters). The theory developed in sections 2 and 3 and the illustrations in sections 4 and 5 demonstrate the inherent strength of Pontryagin’s minimum principle as a means of finding this correction.

^{n×n}such that the solution of the corrected but unforced model (

**x**

_{k}} is the optimal trajectory of (5.3). Subtracting (5.3) from (6.1), we findfor all 1 ≤

*k*≤

*N*, where

**y**

_{k}=

**u**

_{k}. That is, given {

**x**

_{k}} and the optimal input time series {

**y**

_{k}}, we seek to find a time invariant linear operator

**x**

_{k}to

**y**

_{k}for all 1 ≤

*k*≤

*N*.

^{n×n}→

^{n×n}, whereandFrom appendix C it readily follows that the optimal

^{+}denotes the generalized inverse of

Those familiar with optimal interpolation method (Gandin 1965) will readily recognize that the first term on the right-hand side of (6.8) is akin to the cross covariance between **x**_{k} and **y**_{k} and the second term is akin to the inverse of the covariance of **x**_{k} with itself. We now illustrate this idea in the following example.

### Example 6.1

**y**

_{k}=

**u**

_{k}and its associated optimal trajectory

**x**

_{k}found in example 5.1 (with

*n*= 4), the value of

*ζ*_{k}−

**z**

_{k}, the error between corrected but uncontrolled model in (6.10), and

*ξ*_{k}−

**z**

_{k}, the error between the uncorrected and uncontrolled model in (5.11), is given in Fig. 6. It is evident from Fig. 6 that the trajectory of the corrected but uncontrolled model fits the observations better.

We conclude this section with the following remarks:

- Define a vector
**s**(**x**) = (sin**x**, sin2**x**, sin3**x**, sin4**x**)^{T}and definewhere*ξ*_{k}is the (uncontrolled) model trajectory obtained from (5.1) using matrixand *ζ*_{k}is the (uncontrolled) model trajectory obtained from (6.10) with matrix (+ ). Clearly **q**_{1}(**x**,*k*) and**q**_{2}(**x**,*k*) are approximations to the exact solution**q**(**x**,*t*) in (4.3) at*t*=*k*Δ*t*. It can be verified that where**q**(**x**,*k*) =**q**(**x**,*t*) at*t*=*k*Δ*t*. That is, the one-step model error correlation matrixforces the model solution closer to the true solution. - Only for simplicity in exposition did we pose the inverse problem in (6.3) as an unconstrained problem. In fact, one could readily accommodate structural constraint on
—such as requiring it to be a diagonal, tridiagonal, or lower-triangular matrix, etc. Further, we could also readily impose inequality constraints on a selected subset of elements of . - Again, only for simplicity did we obtain a single matrix
that covers the entire assimilation window and mapping **x**_{k}to**y**_{k}for all 1 ≤*k*≤*N.*In principle, we could divide the assimilation window into*L*subintervals. Then we could estimate the correction matrix_{q}using only the (**x**_{k},**y**_{k}) pairs that reside in the*q*th subinterval. In this latter case, we will have a time varying one-step transition correction matrix_{q}for each subinterval, 1 ≤*q*≤*L.*

## 7. Conclusions

The essence of the PMP-based approach to dynamic data assimilation is computation of optimal control sequence **u**_{k} ∈ ^{p} where the parameter 1 ≤ *p* ≤ *n* denotes the “richness” of the control. It follows from experiments 1 and 2 that a larger value of *p* is better. And when this sequence is applied to the deterministic model, it forces the model to track the observations as closely as desired where the closeness is controlled by judicious choices of the relative weights of the two energy terms in the cost functional. More specifically, for a given observational noise covariance matrix *c*_{p} with smaller value of the constant *c* provides a better fit between the model and the data. The computation of this optimal control sequence rests on the solution to a nonlinear TPBVP. While the solution to this latter class of problems can be a daunting task, especially for the large-scale problems of interest in the geosciences, several well-tested numerical methods for finding the solution are known and are available as components of several program libraries.

We have demonstrated the power of this approach by applying it to a nontrivial linear advection problem. For this linear problem, the TPBVP reduces to two initial value problems. In addition we have developed a flexible framework to consolidate the information from the optimal control sequence into a single correction matrix

It should be interesting and valuable to compare the model error corrections obtained using the PMP with those obtained from using the model in a weak constraint formulation.

We are very grateful to Qin Xu and an anonymous reviewer for their comments and suggestions that helped to improve the organizations of the paper. S. Lakshmivarahan’s efforts are supported in part by two grants: NSF EPSCOR Track 2 Grant 105-155900 and NSF Grant 105-15400.

# APPENDIX A

## On the Correctness of the Affine Relation between the Costate Variable **λ**_{k} and the State Variable x_{k} Given in (3.8)

**λ**

*λ*_{N}= 0, from the second equation in (3.7) we getwhich clearly shows that

*λ*_{N−1}is an affine function of

**x**

_{N−1}. Substituting (A.1) back into the second equation in (3.7), we obtainBut from the first equation in (3.7), it follows thatUsing (A.1) in (A.3) and simplifying we getNow substituting (A.4) into (A.2), it follows thatwhich is clearly affine in

**x**

_{N−2}.

Continuing inductively it can be easily verified that *λ*_{k} is an affine function of **x**_{k} as posited in (3.8).

# APPENDIX B

## Solution of the LOM(*n*) in (4.6)

In this appendix we analyze the eigenstructure of the matrix

### a. Eigenstructure of the matrix

Since the structure of the matrix

_{k}∈

^{k×k}be a general tridiagonal matrix of the formfor 1 ≤

*k*≤

*n*. Let

**D**

_{i}denote the determinant of the principal submatrix consisting of the first

*i*rows and

*i*columns of

_{k}. Then the determinant

**D**

_{k}of the matrix

_{k}in (B.1) is obtained by applying the Laplace expansion to the

*k*th (last) row of

_{k}and is given by the second-order linear recurrencewhere

**D**

_{0}= 1 and

**D**

_{1}=

*b*

_{1}.

*b*≡ 0 for all 1 ≤

_{i}*i*≤

*n*,

*i*≤ (

*n*− 1), and

*i*≤

*n*in (B.1), it can be verified that

_{n}reduces to

**D**

_{0}= 1 and

**D**

_{1}= 0. Iterating (B.3), it can be verified thatThus,

*n*is odd. Henceforth we only consider the case when

*n*is even. Refer to Table B1 for values of the determinant of

*n*≤ 10.

Determinant, characteristic polynomial, and eigenvalues of the matrix *n* ≤ 10.

*b*= −

_{i}**,**

*λ***D**

_{n}of

_{n}in (B.1) represents the characteristic polynomial of

**D**

_{0}= 1 and

**D**

_{1}= −

**. Iterating (B.5) leads to the sequence of characteristic polynomials of**

*λ**n*. Table B1 also gives the characteristic polynomial and the eigenvalues of

*n*≤ 10. From this table it is clear that the absolute value of the largest eigenvalue increases and that of the smallest (nonzero) eigenvalue decreases with

*n*. It turns out that the larger eigenvalues correspond to the high-frequency components and the smaller eigenvalues correspond to the low-frequency components that make up the solution

**q**(

*t*) of (4.8).

### b. Jordan canonical form for

^{n×n}denote the matrix eigenvalues of

^{n×n}denote a nonsingular matrix of the corresponding eigenvectors; that is,Thenand

*i*

*λ*_{i}of eigenvalues of

#### Solution of (4.8):

**q**(

*t*) of (4.8) is given byUsing (B.7) in (B.10), it can be shown thatorwhere

**q**(

*t*) and

We conclude this appendix with the following.

##### Example (B.1).

*n*= 4 andwith

**q**

_{i}(

*t*) for each

*i*is a linear combination of the harmonic terms cos(

*λ*_{k}

*t*) and sin(

*λ*_{k}

*t*),

*i*th row of the matrix

# APPENDIX C

## Gradient of ( ) in (6.3)

^{n×n}→

*n*×

*n*matrices. Then, by definition, the gradient

*α*(

**x**) in (6.5), letbe a row partition of

^{T}

*α*(

**x**) with respect to the column vector

### a. Gradient of **β**( , x, y) in (6.6)

**β**

## REFERENCES

Abramov, R. V., , G. Kovacic, , and A. J. Majda, 2003: Hamiltonian structure and statistically relevant conserved quantities for the truncated Burger-Hopf equation.

,*Commun. Pure Appl. Math.***56**, 1–46.Anthes, R. A., 1974: Data assimilation and initialization of hurricane prediction model.

,*J. Atmos. Sci.***31**, 702–719.Athans, M., , and P. L. Falb, 1966

*: Optimal Control.*McGraw-Hill, 879 pp.Bennett, A., 1992:

*Inverse Methods in Physical Oceanography.*Cambridge University Press, 346 pp.Bennett, A., , and M. A. Thorburn, 1992: The generalized inverse of a nonlinear quasigeostrophic ocean circulation model.

,*J. Phys. Oceanogr.***22**, 213–230.Bennett, S., 1996: A brief history of automatic control.

,*IEEE Control Syst.***16,**17–25.Bergman, K. H., 1979: Multivariate analysis of temperatures and winds using optimum interpolation.

,*Mon. Wea. Rev.***107**, 1423–1444.Bergthórsson, P., , and B. Döös, 1955: Numerical weather map analysis.

,*Tellus***7**, 329–340.Boltyanskii, V. G., 1971:

*Mathematical Methods of Optimal Control.*Holt, Rinehart and Winston, 272 pp.Boltyanskii, V. G., 1978:

*Optimal Control of Discrete Systems.*John Wiley and Sons, 392 pp.Bryson, A. E., 1996: Optimal control-1950 to 1985.

,*IEEE Control Syst.***16**, 26–33.Bryson, A. E., 1999:

*Dynamic Optimization.*Addison-Wesley, 434 pp.Canon, M. D., , C. D. Cullum Jr., , and E. Polak, 1970:

*Theory of Optimal Control and Mathematical Programming.*McGraw Hill, 285 pp.Carrier, G. F., , and C. E. Pearson, 1976:

*Partial Differential Equations: Theory and Techniques.*Academic Press, 320 pp.Catlin, D. E., 1989:

*Estimation, Control and the Discrete Kalman Filter.*Springer-Verlag, 274 pp.Dee, D. P., , and A. M. da Silva, 1998: Data assimilation in the presence of forecast bias.

,*Quart. J. Roy. Meteor. Soc.***124**, 269–295.Derber, J., 1989: A variational continuous assimilation technique.

,*Mon. Wea. Rev.***117**, 2437–2446.Eliassen, A., 1954: Provisional report on the calculation of spatial covariance and autocorrelation of pressure field. Institute of Weather and Climate Research, Academy of Sciences Rep. 5, 12 pp. [Available from Norwegian Meteorological Institute, P.O. Box 43, Blindern, N-0313, Oslo, Norway.]

Friedland, B., 1969: Treatment of bias in recursive filtering.

,*IEEE Trans. Autom. Control.***14**, 359–367.Gandin, L. S., 1965:

*Objective Analysis of Meteorological Fields.*Israel Program for Scientific Translations, 242 pp.Goldstein, H. H., 1950:

*Classical Mechanics.*Addison-Wesley, 399 pp.Goldstein, H. H., 1980:

*A History of the Calculus of Variations from the 17th through the 19th Century.*Springer-Verlag, 410 pp.Griffith, A. K., , and N. K. Nichols, 2001: Adjoint methods in data assimilation for estimating model error.

,*Flow, Turbul. Combust.***65**, 469–488.Hirsch, M. W., , and S. Smale, 1974:

*Differential Equations, Dynamical Systems, and Linear Algebra.*Academic Press, 358 pp.Kalman, R. E., 1963: The theory of optimal control and calculus of variations.

*Mathematical Optimization Techniques,*R. Bellman, Ed., University of California Press, 309–329.Kalnay, E., 2003:

*Atmospheric Modeling, Data Assimilation, and Predictability.*Cambridge University Press, 341 pp.Keller, H. B., 1976:

*Numerical Solution of Two Point Boundary Value Problems.*Regional Conference Series in Applied Mathematics, Vol. 24, SIAM Publications, 69 pp.Kuhn, H. W., , and A. W. Tucker, 1951: Nonlinear programming.

*Proc. Second Berkeley Symp. on Mathematical Statistics and Probability,*Berkeley, CA, University of California, Berkeley, 481–492.Lakshmivarahan, S., , and S. K. Dhall, 1990:

*Analysis and Design of Parallel Algorithm: Arithmetic and Matrix Problems.*McGraw Hill, 657 pp.Lakshmivarahan, S., , and J. M. Lewis, 2013: Nudging: A critical overview.

*Data Assmilation for Atmospheric, Oceanic and Hydrologic Applications,*Vol. 2, S. K. Park and L. Liang, Eds., Springer-Verlag, in press.Lewis, F. L., 1986:

*Optimal Control.*John Wiley and Sons, 362 pp.Lewis, J. M., 1972: An operational upper air analysis using the variational methods.

,*Tellus***24**, 514–530.Lewis, J. M., , and S. Lakshmivarahan, 2008: Sasaki’s pivotal contribution: Calculus of variation applied to weather map analysis.

,*Mon. Wea. Rev.***136**, 3553–3567.Lewis, J. M., , S. Lakshmivarahan, , and S. K. Dhall, 2006:

*Dynamic Data Assimilation: A Least Squares Approach.*Cambridge University Press, 654 pp.Lynch, P., 2006:

*The Emergence of Numerical Weather Prediction: Richardson’s Dream.*Cambridge University Press, 279 pp.Majda, A. J., , and I. Timofeyev, 2000: Remarkable statistical behavior for truncated Burgers–Hopf dynamics.

,*Proc. Natl. Acad. Sci. USA***97**, 12 413–12 417.Majda, A. J., , and I. Timofeyev, 2002: Statistical mechanics for truncations of Burger-Hopf equation: A model for intrinsic stochastic behavior with scaling.

,*Milan J. Math.***70**, 39–96.Menard, R., , and R. Daley, 1996: The application of Kalman smoother theory to estimation of 4DVAR error statistics.

,*Tellus***48A**, 221–237.Naidu, D. S., 2003:

*Optimal Control Systems.*CRC Press, 433 pp.Platzman, G. W., 1964: An exact integral of complete spectral equations for unsteady one-dimensional flow.

,*Tellus***16**, 422–431.Polak, E., 1997:

*Optimization.*Springer, 779 pp.Pontryagin, L. S., , V. G. Boltyanskii, , R. V. Gamkrelidze, , and E. F. Mischenko, 1962

*: The Mathematical Theory of Optimal Control Processes.*John Wiley, 360 pp.Roberts, S. M., , and J. S. Shipman, 1972:

*Two-Point Boundary Value Problems: Shooting Method.*Elsevier, 289 pp.Rouch, H. E., , F. Tung, , and C. T. Striebel, 1965: Maximum likelihood estimates of linear dynamic systems.

,*J. Amer. Inst. Aeronaut. Astronaut.***3**, 1445–1450.Sasaki, Y., 1958: An objective analysis based on the variational method.

,*J. Meteor. Soc. Japan***36**, 77–88.Sasaki, Y., 1970a: Some basic formulations in numerical variational analysis.

,*Mon. Wea. Rev.***98**, 875–883.Sasaki, Y., 1970b: Numerical variational analysis formulated under the constraints as determined by longwave equations and low-pass filter.

,*Mon. Wea. Rev.***98**, 884–898.Sasaki, Y., 1970c: Numerical variational analysis with weak constraint and application to surface analysis of severe storm gust.

,*Mon. Wea. Rev.***98**, 899–910.Shen, J., , T. Tang, , and L. L. Wang, 2011:

*Spectral Methods.*Springer-Verlag, 470 pp.Wiener, N., 1948:

*Cybernetics: Control and Communication in the Animal and Machine.*John Wiley, 194 pp.Wiin-Nielsen, A., 1991: The birth of numerical weather prediction.

,*Tellus***43A**, 36–52.Zupanski, D., 1997: A general weak constraint applicable to operational 4DVAR data assimilation system.

,*Mon. Wea. Rev.***125**, 2274–2292.