Abstract

Over the decades the role of observations in building and/or improving the fidelity of a model to a phenomenon is well documented in the meteorological literature. More recently adaptive/targeted observations have been routinely used to improve the quality of the analysis resulting from the fusion of data with models in a data assimilation scheme and the subsequent forecast. In this paper our goal is to develop an offline (preprocessing) diagnostic strategy for placing observations with a singular view to reduce the forecast error/innovation in the context of the classical 4D-Var. It is well known that the shape of the cost functional as measured by its gradient (also called adjoint gradient or sensitivity) in the control (initial condition and model parameters) space determines the marching of the control iterates toward a local minimum. These iterates can become marooned in regions of control space where the gradient is small. An open question is how to avoid these “flat” regions by bounding the norm of the gradient away from zero. We answer this question in two steps. We, for the first time, derive a linear transformation defined by a symmetric positive semidefinite (SPSD) Gramian G=F¯TF¯ that directly relates the control error to the adjoint gradient. It is then shown that by placing observations where the square of the Frobenius norm of F¯ (which is also the sum of the eigenvalues of G) is a maximum, we can indeed bound the norm of the adjoint gradient away from zero.

1. Introduction

Judicious selection of the type (pressure, humidity, and wind, to name a few) and placement (the location within a given spatiotemporal domain of interest) of weather observations to advance understanding of a phenomenon and the concomitant practical ability to improve forecasts has a rich history in meteorology. Among the most celebrated use of weather observations to understand the ubiquitous extratropical cyclone was the fieldwork of Scandinavian meteorologists Jacob Bjerknes and Eric Palmén (Bjerknes and Palmén 1937). They organized the simultaneous release of 120 radiosondes from 11 European countries over a 2-day period that led to diagnosis of motions in the upper troposphere along a 3500-mi (5630-km) latitudinal cross section. Further, the study made it clear that the meridional gradient of the Coriolis parameter was an important parameter in the dynamics of upper waves. This revelation inspired Carl-Gustaf Rossby to derive his celebrated wave formula (Eliassen 1995).

In the spirit of the Bjerknes–Palmén study, but in our current day, the use of research aircraft to obtain thermodynamic and wind structure in the vicinity of Atlantic hurricanes through use of Omega dropwindsondes (ODWs) has gone far to improve the track forecasting of hurricanes (Burpee et al. 1996). Transmission of these data to operational prediction centers in real time permitted inclusion of these data into the operational models. The improvement in track forecasting as tested on numerous hurricanes was impressive—track error reductions from 16% to 30%. Also impressive was the hurricane forecasting experience of the Burpee team that led them to choose observation locations on the hurricane’s periphery as well as near the storm’s center where the thermodynamic and wind data were critically important as input to track forecasting. Majumdar (2016) and Majumdar et al. (2011) present an excellent review of “targeted observations,” which classified the Burpee et al. (1996) as an example of targeting based on synoptic reasoning—not only an appropriate label for this contribution but also for the earlier contribution by Bjerknes and Palmén (1937).

Starting from the mid-1990s there has been a growing body of work within the meteorological literature related to (i) the analysis of impact of initial condition and observation on a chosen scalar metric related to forecast quality (over a subdomain or the entire domain of interest) and (ii) targeted/adaptive observation aimed at finding the type of a new set of observations that will bring improvement to forecast quality, such as, reduction of analysis covariance, and (iii) when there is a large amount of correlated set of observations (as may happen with satellites or radars), there has been great interest in the so-called “thinning” and/or creating superobservations by computing, say, an average over a given subdomain and associating that observation with the centroid of the subdomain.

The techniques used in these areas may be broadly divided into two groups: either based on the adjoint approach rooted in the classical variational analysis or using one of the many ensemble implementations of Kalman filtering approach to data assimilation. While much of the literature concentrates on the choice of the suitable type of observations to improve the forecast quality, a few also consider distribution of observation to answer a similar goal.

Specifically, Berliner et al. (1999) examine the application of statistical experimental design (Cochran and Cox 1992) to adaptively decide on supplementary observations that would improve the analyses. Torn and Hakim (2008), Ancell and Hakim (2007, hereafter referred to as AH2007), and Hakim and Torn (2008) use ensemble methods for targeting observations. AH2007 also contains a comparison of the adjoint based and ensemble based approaches to observation targeting. Lorenz and Emanuel (1998), Langland et al. (1999), Langland and Baker (2004), Evans et al. (1998), Koch et al. (2018), Manohar et al. (2018), Kotsuki et al. (2019), Majumdar et al. (2011), and Majumdar (2016) contains extensive discussion of “thinning,” creating superobservations, targeted-observation strategies including adjoint sensitivity, forecast sensitivity to observations, ensemble transform filter method, and ensemble sensitivity. All these strategies are designed to improve analyses and ultimately reduce forecast error.

Our approach falls outside the strategies used in targeted/adaptive observations although the goal has strong overlap with those methodologies. It is assumed that we have observations of a given type represented by a vector Z in the observation space Rm. It is well known that when the model map M and/or the forward operator h (that relates the observation to the state) are nonlinear, the cost functional may exhibit multiple minima (Lewis et al. 2006, example 24.4.1 in chapter 24). In addition to these intrinsic multiple minima that are purely dependent on the given nonlinearities in M and h, and the covariance matrix of the observational noise, the shape of the cost functional in the control space (consisting of initial conditions and model parameters) is also critically dependent on distribution (location) of the observations in the state space. Our goal is to control the shape of the cost functional by analyzing the dependence of its gradient (with respect to the control) on the (spatiotemporal) location of the observations. Namely, force the magnitude of this gradient to be bounded away from zero outside the intrinsic local minimum.

This is achieved through a strategy that decides on observation placement in space and time but in the context of classical four-dimensional variational data assimilation (4D-Var) (LeDimet and Talagrand 1986; Lewis and Derber 1985). It is well known that the shape of the cost function as measured by its gradient in control space determines the marching of the control iterates toward a local minimum of the cost function. The iterates of the minimization algorithm can become marooned in regions where the gradient is very small. An open question is how to avoid these “flat” regions in control space by guaranteeing that the norm of the gradient of the cost function is bounded away from zero. While much of the earlier work deal with sensitivity of the cost functional only with respect to the initial conditions alone, our analysis simultaneously deals with initial conditions and parameters on the same footing.

This question is answered in two steps. First, we derive a linear transformation defined by a symmetric positive semidefinite (SPSD) Gramian1G=F¯TF¯ that maps the error in the control to the gradient of the cost functional alias adjoint sensitivity or gradient,2 where the component matrix F¯ is the product of 1) the forward sensitivity of the model solution, 2) the Jacobian of the forward operator and 3) the square root of the inverse of the covariance matrix of the observation noise. Refer to Fig. 1 for illustration. It is then shown that by placing observations where the square of the Frobenius norm of F¯ attains a maximum, we can indeed bound the norm of the adjoint gradient away from zero.

Fig. 1.

A pictorial view of the linear relation g = Gf between f = δc, the error in the initial in control c and g = −∇cJ the negative of the gradient of the cost function J(c), where G=F¯TF¯. The component g^ is the orthogonal projection of g along f. That is, g^=f^f^,g=Pfg, where f^ is the unit vector along f and Pf is the orthogonal projection matrix along f.

Fig. 1.

A pictorial view of the linear relation g = Gf between f = δc, the error in the initial in control c and g = −∇cJ the negative of the gradient of the cost function J(c), where G=F¯TF¯. The component g^ is the orthogonal projection of g along f. That is, g^=f^f^,g=Pfg, where f^ is the unit vector along f and Pf is the orthogonal projection matrix along f.

In section 2, we provide a short summary of the forecast error correction problem by relating the adjoint sensitivity or adjoint gradient to the innovation or forecast error. By relating this forecast error to the model’s forward sensitivity, we derive our main results in section 3. The intrinsic relation that exists between the adjoint sensitivity and the control error is further analyzed in section 4. The impact of observation placement on controlling the shape of the cost functional is analyzed in section 5. Two examples are considered in section 6: an air–sea interaction problem in time alone, and Saltzman’s (1962) more complicated nonlinear model of convection in space and time. Both examples exhibit the methodology’s avoidance of flat gradients in control space. Section 7 contains recommendations for placement of observations and concluding remarks.  Appendix A contains mathematical details that are key to the analysis in sections 35. The Gramian structure of G is developed in  appendix B. In the short  appendix C, we further illustrate the difference between our approach and a typical adjoint sensitivity analysis in AH2007.

2. Forecast error correction problem

a. Model equations

Let x(k) = [x1(k), x2(k), …, xn(k)]TRn be the state of a nonlinear, deterministic, discrete time dynamic model at times k = 0, 1, 2, …. Let α = (α1, α2, …, αp)TRp be the vector of p model parameters and let M: Rn × RpRn denote the (one-step state transition) model map where M(x, α) = [M1(x, α), M2(x, α), …, Mn(x, α)]TRn. Let

 
x(k+1)=M[x(k),α]
(2.1)

be the given dynamical system whose solution starting from an initial state x(0) denotes the model forecast. Let x(k) = x[k, x(0), α] denote the solution of (2.1). Since the pair c = [xT(0), αT]TRn × Rp controls the evolution of x(k), it is known as the control. It is tacitly assumed that the model in (2.1) is the best known model for the process in question and is suitably parameterized to capture the inherent natural variations. Clearly, simultaneous estimation of the initial conditions and parameters in dynamic data assimilation puts this par with the goals and methods used in the “adaptive control” literature (Narendra and Annaswamy 2005).

b. Observation

Let z(k) ∈ Rm be the vector observation at time k given by

 
z(k)=h[x¯(k)+ξ(k)],
(2.2)

where h: RnRm and h(x) = [h1(x), h2(x), …, hm(x)]TRm is called the forward operator that relates the model space Rn and the observation space Rm; x¯(k) is the true but unknown state of nature at time k and ξ(k) ~ N[0, R(k)] is a zero mean, temporally uncorrelated, Gaussian noise with R(k) ∈ Rm×m as its covariance matrix at time k. It is assumed that R(k) is known and positive definite in addition to being symmetric. In the case that the forward operator is linear, (2.2) takes the form z(k) = Hx(k) + ξ(k) where HRm×n is a matrix that takes model output and converts it to the model’s counterpart of the observations.

c. Assumption

It is assumed that the model map M in (2.1), the forward operator h in (2.2) and the model solution x(k) are continuously differentiable in their arguments. That is, the following Jacobians are well defined for all x, α, and k ≥ 0:

 
DM(k)=[Mi(x,α)xj]x=x(k)Rn×n,DMα(k)=[Mi(x,α)αj]x=x(k)Rn×p,Dh(k)[hi(x)xj]x=x(k)Rn×m,
(2.3a)
 
U(k)=[Uij(k)]=[xi(k)xj(0)]Rn×nandV(k)=[Vij(k)]=[xi(k)αj]Rn×p.
(2.3b)

d. Innovation/forecast error

Let z(k) be a single observation given at time k. The difference

 
e¯(k)=z(k)h[x(k)]
(2.4)

is called the innovation or the forecast error. Refer to Fig. 2 for an illustration.

Fig. 2.

x¯(k) is the true but unknown state x(k) of the model at time k starting from the true but unknown control c¯=[x¯T(0),α¯T]T. x(k) is the model forecast at time k starting from the erroneous initial control c=[x(0),αT]Tc¯. δx(k)=x¯(k)x(k) is the first variation in the model state at time k indicated by the first variation δc = [δxT(0), δαT]T, where δx(0)=x¯(0)x(0) and δα=α¯α. z(k) is the observation at time k that contains information on x¯(k) through the relation z(k)=h[x¯(k)]. The forecast error e¯(k)=z(k)h[x(k)].

Fig. 2.

x¯(k) is the true but unknown state x(k) of the model at time k starting from the true but unknown control c¯=[x¯T(0),α¯T]T. x(k) is the model forecast at time k starting from the erroneous initial control c=[x(0),αT]Tc¯. δx(k)=x¯(k)x(k) is the first variation in the model state at time k indicated by the first variation δc = [δxT(0), δαT]T, where δx(0)=x¯(0)x(0) and δα=α¯α. z(k) is the observation at time k that contains information on x¯(k) through the relation z(k)=h[x¯(k)]. The forecast error e¯(k)=z(k)h[x(k)].

e. A classification of forecast errors

For the purposes of our analysis in this paper, it is useful to classify the forecast errors into three types (Lakshmivarahan and Lewis 2010; Lakshmivarahan et al. 2017). Type 1 error is said to occur when the model is assumed to be perfect and the true control c¯=[x¯T(0),α¯T]T is known. In this case, x(k)=x¯(k) and the forecast error is temporally uncorrelated and is a pure Gaussian noise that cannot be controlled. Type 2 error occurs when the model is perfect, but the control c differs from the true but unknown control c¯. In this case, the forecast error e¯(k) depends only on the control error, δc = [δxT(0), δαT]T and

 
δx(k)=x¯(k)x(k)fork0,δα=α¯α.
(2.5)

Type 3 error is said to occur when the model is not perfect and the true control is not known. In this case, the forecast error is confounded by the unknown model error and the unknown control error. In this paper, we confine our analysis to the simpler case of type 2 errors. Estimation of type 3 errors is contained in Lakshmivarahan et al. (2013).

f. Statement of the problem

The forecast error correction problem is usually recast as the minimization of the weighted sum of squared error criterion. To this end, define a cost functional J: Rn × RpR given by

 
J(c)=e¯(k)R1(k)2=12e¯(k),R1(k)e¯(k),
(2.6)

where ⟨a, b⟩ denotes the standard inner product. Again, we hasten to add that if prior information on either the initial condition X(0) or the parameter α or both is available, we could add a quadratic penalty term to the cost function, which will contribute another term in the expression for adjoint sensitivity.

Since e¯(k) depends on the forecast x(k) that in turn is a function of the control c, the cost functional is an implicit function of c through the given model Eq. (2.1). We hasten to add that since the cost functional is additive in the number of observations, without loss of generality, it is assumed that there is only one observation z(k) ∈ Rm at time k. This constrained minimization problem is usually solved as a strong constraint problem using the standard Lagrangian multiplier method. The resulting framework has come to be known as the adjoint method (see Lewis et al. 2006, chapters 22–24).

3. Fine structure of adjoint sensitivity

Instead of determining the adjoint sensitivity through use of the tangent linear model as typically done in the standard 4D-Var mechanics (see Lewis et al. 2006, chapter 24), we derive an alternate expression for δx(k) in terms of the forecast sensitivity to control and the associated incremental change in control.

a. Alternate expression for δx(k)

The first-order Taylor expansion for δx(k) is given by

 
δx(k)=U(k)δx(0)+V(k)δα,
(3.1)

where the dynamics of evolution of the forward sensitivity matrices U(k) and V(k) defined in (2.3b) are given in the following (see Lakshmivarahan et al. 2017, chapters 2 and 3):

 
U(k+1)=DM(k)U(k),with initial condition,U(0)=I,
(3.2)
 
V(k+1)=DM(k)V(k)+DMα(k),with initialconditionV(0)=0.
(3.3)

b. An alternate expression for the forecast error e¯(k)

Using (2.2), we get an expression for the actual forecast error as

 
e¯(k)=z(k)h[x(k)]=h[x¯(k)]h[x(k)]+ξ(k).
(3.4)

But using the expression for δx(k) in (2.5), a first-order approximation e(k) to e¯(k) is given by

 
e(k)=h[x(k)+δx(k)]h[x(k)]+ξ(k)=Dh(k)δx(k)+ξ(k),
(3.5)

where the Jacobian Dh(k) ∈ Rm×n is given in (2.3a). Further, substituting (3.1) in (3.5) and regrouping, we get an alternate expression for e(k) as

 
e(k)=A(k)δx(0)+B(k)δα+ξ(k),
(3.6a)

where A(k) = Dh(k)U(k), B(k) = Dh(k)V(k).

Indeed, the correction δc to the control c needed to reduce the forecast error (modulo observation noise) can be readily obtained by solving a linear least squares problem (Lakshmivarahan et al. 2017, chapter 2) obtained by rewriting (3.6a) as

 
e(k)=S(k)δc+ξ(k),
(3.6b)

where S(k) = [A(k), B(k)] ∈ Rm×(n+p).

Remark 3.1: In the 4D-Var framework, the adjoint sensitivity is computed using the actual forecast error, e¯(k) in (3.4). Refer to chapter 24 in Lewis et al. (2006) for details. However, in this alternate approach, we used the first-order approximation e(k) in (3.5). When h(x) is linear, that is, h(x) = Hx, then e¯(k)=e(k). This alternate method for quantifying the forecast error is known as the first-order forward sensitivity method (FSM) and is closely related to the 4D-Var formulation. For details refer to Lakshmivarahan and Lewis (2010) and the monograph Lakshmivarahan et al. (2017).

c. Fine structure of adjoint sensitivity

It can be verified that the exact first variation δJe in J(c) resulting from the initial variation δc in c is given by (see Lewis et al. 2006, chapter 24)

 
δJe=DhT(k)R1(k)e¯(k),δx(k).
(3.7)

Replacing e¯(k) in (3.7) by e(k), we get an approximation δJ to δJe given by

 
δJ=DhT(k)R1(k)e(k),δx(k).
(3.8)

Substituting e(k), S(k), A(k), and B(k) from (3.6)(3.7) and δx(k) from (3.1) and regrouping, we express δJ as the sum of the deterministic component, δJd and the stochastic component δJn, which are given as follows:

 
δJd=H¯(k)[U(k)δx(0)+V(k)δα],[U(k)δx(k)+V(k)δα],
(3.9a)
 
δJn=DhT(k)R1(k)ξ(k),[U(k)δx(0)+V(k)δα].
(3.9b)

Define a symmetric matrix

 
H¯(k)=DhT(k)R1(k)Dh(k)Rn×n
(3.10)

that depends only on the observation system. Again, collecting the like terms and simplifying, it is immediate that

 
δJd=UT(k)H¯[U(k)δx(0)+V(k)δα],δx(0)VT(k)H¯[U(k)δx(0)+V(k)δα],δα
(3.11a)

and

 
δJn=UT(k)DhT(k)R1(k)ξ(k),δx(0)VT(k)DhT(k)R1(k)ξ(k),δα.
(3.11b)

From first principles,3 we can readily identify that the gradients of J:

 
x(0)J(c)=[UTH¯U]δx(0)+[UTH¯V]δαUTH¯ξ(k),
(3.12a)
 
αJ(c)=[VTH¯U]δx(0)+[VTH¯V]δαVTH¯ξ(k),
(3.12b)

where

 
H¯(k)=DhT(k)R1(k)Rn×m.
(3.13)

These relations clearly bring out the impact of interaction resulting from simultaneous variation of both the initial condition and the parameters and the observation noise covariance on the adjoint sensitivities.

Setting

 
cJ=[x(0)JαJ]andδc=[δx(0)δα],

(3.11)(3.13) can be succinctly expressed as

 
cJ=cJd+cJn,cJd=[FTH¯F]δcandcJn=FTH¯(k)ξ(k),
(3.14)

where

 
F=[U,V]Rn×(n+p),FT=[UTVT]R(n+p)×n
(3.15)

and

 
FTH¯F=[UTVT]H¯[U,V]=[UTH¯UUTH¯VVTH¯UVTH¯V]R(n+p)×(n+p)
(3.16)

is the outer product matrix, which inherits the Gramian structure alluded to in the introduction. Stated in other words, there exists a linear relation between δc, the error in the control to the deterministic part of the adjoint gradient, ∇cJd, and a linear relation between the observation noise ξ(k) and the stochastic component ∇cJn. This new expression for the adjoint sensitivity clearly brings out (i) the role of the model through the forward sensitivities U and V on one hand, and (ii) the role of the observation system through H¯=DhTR1Dh on the other. Indeed, this separability of the effect of the model and the observation system clearly provides the “fine structure” of adjoint sensitivity that, to our knowledge, is hitherto unknown in the literature.

Remark 3.2: The standard assumption about the observation noise ξ(k) is that it is zero mean, temporally uncorrelated, Gaussian with R(k) as its covariance at time k; that is, ξ(k) ~ N[0, R(k)]. Hence, from (3.14), the expected value and covariance (Cov) are given by

 
E(cJ)=cJd
(3.17)

and

 
Cov(cJ)=E[(cJn)(cJn)T]=FTH¯RH¯TF=FTDhTR1(k)DhF=FTH¯F
(3.18)

using (3.10) and (3.13). Henceforth, following the tradition in statistical physics (Tolman 2010), we invoke the principle of mean field analysis and concentrate only on the deterministic component ∇cJd of the adjoint gradient. For simplicity in the notation, in the following, we identify ∇cJ with ∇cJd. Further,  appendix C clearly brings out the difference between the approach of this paper and a typical adjoint sensitivity analysis AH2007.

4. A tale of two vectors: δc and ∇cJ—Further analysis

An immediate import of the above analysis is the deviation of the simple but key relation in (3.14) where the matrix FTH¯F, that maps the control space R(n+p) of unknown control error vector δc into the space R(n+p) of the adjoint sensitivity ∇cJ.

As a prelude to further analysis, let us further simplify the notation by defining the following equivalences:

 
g=cJ,f=δcandG=FTH¯F.
(4.1)

Consequently, form (3.14) we obtain a fundamental relation

 
g=Gf.
(4.2)

Remark 4.1: An estimate of the Hessian of J(c) at the minimum: It is well known that if A is an SPD matrix and b is a vector, then ∇Q(x) = Axb if and only if Q(x) = (1/2)xTAxbTx + d for some arbitrary constant d (Meyer 2000; Lewis et al. 2006, chapter 10). From (4.1) and (4.2), we get cJ=Gf=G(c¯c)=Gcb where b=Gc¯. Combining these two facts, we can locally approximate J(c) around c¯ by a quadratic functional given by J¯(c)=(1/2)cTGcbTc. Hence, the Hessian of J¯(c) is given by 2J(c)2J¯(c)=G=FTH¯F, which by (3.18) is also the covariance of ∇cJ. Clearly, this Hessian is independent of the observation and its condition number controls the shape of J(c).

The meaning and the power of this new relation in (4.2) is captured in Fig. 1 from which it is evident that we can express g as the sum two orthogonal components:

 
g=g^+g^,
(4.3)

where g^ is the orthogonal projection of g along f and g^=gg^.

Thus, when we use g routinely in the minimization of cost functional in the 4D-Var method (Lewis et al. 2006, chapters 22–24), it is now clear from Fig. 1 that it is not g, but only its component g^ that is responsible for bringing the reduction in the value of the cost functional and hence the forecast error. An important question is: what factors affect the magnitude of g^? The 4D-Var framework is silent on this question. Consequently, we now introduce a new measure for the effectiveness of an observation.

Definition 4.1: An observation z(k) ∈ Rm at time k is said to be effective if the magnitude of the orthogonal projection g^ is large. If g^ is small, then the observation is said to be ineffective.

It turns out that g^ can be small due to two mutually exclusive reasons. First, taking the norm of both sides of (4.2) and using (4.3), it is immediate that

 
g^gGf.
(4.4)

Hence, if the magnitude of the elements Gi,j, of the matrix G are simultaneously small, then G is small and hence g^ is small.

Stated in other words, if we place the observations in locations where the model is least sensitive, then from (4.4) it follows that the norm ||g|| is small in and around these locations the cost functional has flat patches in the control space Rn+p. Consequently, if g^ is small, then the gradient method will bring very little improvement to forecast quality.

There is yet another situation where g^ can be very small. This happens when the control error vector f lies either in the null space4 of G or very close to it. In such cases, g is either orthogonal or nearly orthogonal to f and so g^ is very small even if g is large.

It is against this backdrop, we describe a framework in  appendix A that directly relates the spectral properties of the matrix G and its attendant impact on g^.

Since f is unknown but a constant vector, from (A.19) we get a basic relation:

 
g^f=idiyi2minyi20{yi2}(idi).
(4.5)

From (A.20), we can express

 
idi=tr[G]=tr[F¯TF¯]=F¯F2,
(4.6)

which is the Frobenius norm of F¯.

Combining (4.5) and (4.6), it is now clear that we can maximize g^ by maximizing the Frobenius norm of the sensitivity matrix F¯ and this is independent of f.

In the following section 5, we systematically explore the conditions for maximizing g^ using (4.5) and (4.6).

5. Impact of observation placement

In exploring the properties of the projection g^ and consequently the shape of the cost functional J(c) through dependence of its gradient on the forward sensitivities, we now move to analysis of expressions in (3.12)(3.16).

  • Case 1: x(0) is varied and α is held constant.

    In this case, δα = 0 and from (3.14)(3.16) and  appendix B, 
    g=x(0)J=[UTH¯U]δx(0)=[U¯TU¯]δx(0).
    (5.1)
    • Case a: mn and Rank(UTH¯U)=n (Meyer 2000). Then NULL(UTH¯U)={0}.

      From (4.5) and (4.6), it follows that 
      g^fminyi20{yi2}jU¯*j2U¯*j2,
      (5.2)

      where U¯*j is the jth column of U¯ and y = (y1, y2, …, yn)TRn are coordinates of δx(0) in the orthogonal basis formed by the eigenvectors of the Gramian UTH¯U. In (5.2), while we do not have any control over the magnitude of yis, we can maximize the first factor on the right-hand side of (5.2) by placing the observations where the sum of the squares of the norms of the columns of U¯ is a maximum.

    • Case b: m < n and rank (UTH¯U)=m. Then NULL(UTH¯U) has dimension (n × m). For any δc in this null space, g ≡ 0.

    Again from (4.5) and (4.6), it is immediate that 
    g^fminyi20{yi2}jU*j2.
    (5.3)

    The second factor on the right-hand side of (5.3) can be maximized by placing the observations where the sum of the squares of the norms of the first m columns of U is a maximum.

  • Case 2: α is varied and x(0) is held constant.

    In this case δx(0) = 0 and from (3.14)(3.16), g=αJ=[VTH¯V]δα where we recall from  appendix B that VTH¯V=V¯TV¯Rp×p. Again, two cases arise depending on rank (VTH¯V)=p or rank (VTH¯V)<p. In the latter case, the null space of VTH¯V has dimension greater than or equal to one and for any δc in this null space, g ≡ 0.

    From (4.5) and (4.6), it can be verified that 
    g^fminyi20{yi2}jV¯*j2.
    (5.4)

    By placing the observations in the time interval where the sum of the squares of the norms of the matrix is a maximum, we can enforce or guarantee a nonzero lower bound on the norm of the adjoint gradient.

  • Case 3: Both x(0) and α are varied.

    In this case, from (3.14)(3.16), we get 
    g=cJ=(FTH¯F)=(FTF)δc.
    (5.5)
    From (4.5) and (4.6), we get 
    g^fminyi20{yi2}(idi).
    (5.6)

    Hence, by placing observations in the time intervals containing the maximum of the sum of the square of the norms of the columns of F¯T, we bound the norm of g^ as far away from zero as is feasible.

As a guide to applying the theory developed above, we summarize the key steps as follows.

Algorithm for the placement of observations

  • Step 1: Pick a control c = [xT(0), αT]T and run the model (2.1) forward in time to generate the forecast trajectory x(k) for k ∈ [0, T] for some large T > 0.

  • Step 2: Compute the Jacobians: DM(k), DMα(k) and Dh(k) in (2.3) along the trajectory in step 1.

  • Step 3: Compute the evolution of the forward sensitivity fields U(k) and V(k) using linear, time-varying dynamics in (3.2) and (3.3).

  • Step 4: Compute G = FTHF using (3.15) and (3.16).

  • Step 5: Plot the variation of one of the following norms versus time: (i) GF2, (ii) G*j2, or (iii) Gij2.

  • Step 6: Identify small-time intervals containing the maxima in the plots generated in step 5. By sorting the values of these maxima in the decreasing order, we can then determine the number and location of the observations.

Clearly, those time intervals containing the maximum of one or more of the quantities in step 5 of the algorithm given above are the preferred locations for the placement of observations. The idea is that, if for any reason if we cannot place an observation at the maximum, we have the additional leverage of placing it close to it within a small interval containing it.

6. Two examples

In this section, we illustrate the theory developed in the previous sections using two examples. First is the 1D problem of air–sea interaction (Lakshmivarahan and Lewis 2010) where the control consists of initial and boundary conditions and a parameter. The second is the now famous example of the Saltzman’s 7D lower-order model for convection called SLOM (7). It is not out of place to mention that it is the initial analysis of SLOM (7) in Saltzman (1962) provided the key idea and the impetus to the discovery of the now famous Lorenz LOM (3) (Lorenz 1963) exhibiting deterministic chaos (Lorenz 1993, p. 137). Refer to the recent paper by Lakshmivarahan et al. (2019a) for more details on the relation between Saltzman (1962) and Lorenz (1963).

a. Air–sea interaction problem: 1D model

Let n = 1 and p = 2. Let α = (xs, β)TR2 be the set of two parameters. Consider the 1D model equation:

 
dx(t)dt=β[xsx(t)],
(6.1)

with x(0) as the initial condition. The control vector c = [x(0), αT]TR3.

This model describes the dynamics of evolution of the normalized temperature x(t) of the cold continental air as it moves with the prevailing wind over the warm ocean that is assumed to have a constant sea surface temperature xs. Clearly, xs denotes the boundary condition and β is known as the turbulent heat exchange parameter. See Lakshmivarahan and Lewis (2010) for more details.

The solution of (6.1) is given by

 
x(t)=x[t,x(0),xs,β]=[x(0)xs]eβt+xs=xsηeβt,
(6.2)

where η = xsx(0). Exact expression for the forward sensitivities U, V1, V2, the matrix FTH¯F, and expressions for the components of the adjoint sensitivities are given in Table 1.

Table 1.

Sensitivity of the solution of Eq. (6.1).

Sensitivity of the solution of Eq. (6.1).
Sensitivity of the solution of Eq. (6.1).

A plot of the base solution x¯(t) of (6.1) starting from the base control c¯=[x¯(0),x¯s,β]=(1.0,11,0.25)T is given in Fig. 3. In this and the rest of the figures, the discrete time k corresponds to the continuous time t = kΔt where Δt = 0.1. Let c = [x(0), xs, β]T = (2.0, 10, 0.3)T be control for the forecast and the model forecast x(t) using (6.1) starting from c is plotted in Fig. 4. Evolution of forecast sensitivities of solution of (6.1) for the same initial condition as in Fig. 4 are given in Fig. 5.

Fig. 3.

The base solution x¯(t)of (7.1) starting from the base control c¯=[x¯(0),x¯s,β¯]T=(1.0,11,0.25)T.

Fig. 3.

The base solution x¯(t)of (7.1) starting from the base control c¯=[x¯(0),x¯s,β¯]T=(1.0,11,0.25)T.

Fig. 4.

Forecast x(t) using (7.1) starting from the control c = [x(0), xs, β]T = (2.0, 10, 0.3)T.

Fig. 4.

Forecast x(t) using (7.1) starting from the control c = [x(0), xs, β]T = (2.0, 10, 0.3)T.

Fig. 5.

Evolution of the forward sensitivities (a) U, (b) V1, and (c) V2 given in Table 1 starting from the base control c¯ as in Fig. 4.

Fig. 5.

Evolution of the forward sensitivities (a) U, (b) V1, and (c) V2 given in Table 1 starting from the base control c¯ as in Fig. 4.

Variation of the inner product of the time varying (row) sensitivity vector F in Table 1 and the constant control error vector f=δc=c¯c, with time is given in Fig. 6. It can be verified from Fig. 6 that at time close to t* ≈ 4.4, the sensitivity vector F is orthogonal to δc as indicated by the vanishing of the inner product. That is, δc at this time instant lies in the null space of the rank-one, symmetric matrix, G=FTH¯F given in Table 1. The plot of the variation of the norm of the gradient, ||g|| that is given in Table 1 is again illustrated in Fig. 7. Clearly, ||g|| vanishes at t* ≈ 4.4 and is small in its neighborhood.

Fig. 6.

Evolution of the inner product of F in Table 1 and a fixed δc = (−1.0, 1.0, −0.05)T.

Fig. 6.

Evolution of the inner product of F in Table 1 and a fixed δc = (−1.0, 1.0, −0.05)T.

Fig. 7.

Variation of the norm of adjoint gradient in Table 1 with time of observation.

Fig. 7.

Variation of the norm of adjoint gradient in Table 1 with time of observation.

Variation of the square of the individual forward sensitivities U, V1, and V2 in Table 1 with time are given in Fig. 8. Clearly U2 is a maximum close to the initial time and V12 attains its maximum value when t is large. But V22 attains its maximum at time t** ≈ (1/k) = 3.33 (since β = 0.3 is used for forecast). Hence by placing three observations at times, say at times t1 = 1, t2 = 20, and t3 = 3.3, we can capture information on the initial condition x(0), boundary condition xs and the parameter β. Combining (4.6) with expressions in Table 1, it can be verified (assuming σ2 = 1) that tr(G)=tr(FTH¯F)=U2+V12+V22. Since U, V1, and V2 vary in their magnitude, to bring out the key features in the sum, a plot of the scaled version: (1/50)(U2+V12+V22) against time is contained in Fig. 9. One can pick an equivalent set of three-time instances using this plot in Fig. 9 for placing observations.

Fig. 8.

Time evolution of U2, V12, and V22 given in Table 1.

Fig. 8.

Time evolution of U2, V12, and V22 given in Table 1.

Fig. 9.

Evolution of the sum of U2, V12, and V22 given in Table 1.

Fig. 9.

Evolution of the sum of U2, V12, and V22 given in Table 1.

We conclude this discussion by reporting the results of two experiments in Table 2. In the first experiment, four observations are chosen at times where the sensitivities are large and in the second where the sensitivities are small. The entries in the last column of Table 1 clearly illustrate the theory developed in this paper.

Table 2.

Two experiments about the air–sea interaction problem.

Two experiments about the air–sea interaction problem.
Two experiments about the air–sea interaction problem.

b. Application to Saltzman’s Model: SLOM (7)

The strategy for numerical experiments follows the “identical twin” approach where two sets of model controls are specified—in this case of Saltzman’s spectral model, the Rayleigh number (Rayleigh 1916) and the model’s initial condition. One set is labeled “true control” and the other set, purposely made different than those for the true state is labeled “forecast control.” Variational data assimilation is used to correct the forecast control and bring it closer to true control. The theme of the experiment is to choose observation points that guarantee avoidance of “flatness” of the cost function in the space of control. The choice of these points rests on the foregoing theory that identifies a matrix and its trace. The location of these points is based on a two-part process:

1) Part 1: Choice of location

(i) Stage 1: Controls specified

The true control c¯ and forecast control c are specified. The evolution of the true state and forecast state are followed for a sufficiently long time to note differences in the convective regimes (as in step 1 of the algorithm in section 5).

(ii) Stage 2: Determination of forecast sensitivities and associated G matrix

Forecast sensitivities to control, c are found through solution to the augmented set of governing equations for Saltzman’s model. From these sensitivities and the forward operator [see section 6b(4) below]—the operator that determines the model counterpart to observations—the G matrix and tr(G), are determined. In the computation of G and its trace, only the forward sensitivities to control and model counterpart of the observations are used, actual observations are not needed.

(iii) Stage 3: Observation placement

Observation locations are chosen in the (x, z, t) space where centroids of large values of tr(G) are the ideal locations.

2) Part 2: Subsequent data assimilation

(i) Stage 4: Cost function J

The cost function J is defined as the sum of squared differences between the observations and the associated forecasts at the points in time and space where observation locations are chosen (observations at that point in time are based on true control and forecasts are based on erroneous control).

(ii) Stage 5: Variational data assimilation

The variational data assimilation method is used to adjust control and bring forecast control closer to true control.

Remark 6.1: Knowledge of the sensitivities found in step 2 allows calculation of cost-functional gradient without adjoint equations—an alternate method of 4D-Var. This alternate method is used in the numerical experiments that follow.

3) Governing equations

Saltzman (1962) assumed a seven-mode spectral model of convection that obeys the Boussinesq (1903) equations describing “roll” convection between two free surfaces maintained at constant temperature difference. Convection takes place in a thin layer of fluid heated from below. The convective motions take place in the xz plane; that is, there is no variation in the y direction where x and y represent the horizontal Cartesian axes. The vertical coordinate z is perpendicular to the horizontal plane (positive upward). The momentum, mass, and thermodynamic equations can be reduced to two equations in vorticity and temperature departure—departure from the linear base-state temperature profile. These equations are found in Saltzman [1962, Eqs. (16) and (17)].

A double Fourier series form of spectral solution is assumed with three streamfunction modes and four temperature departure modes that are given in Table 3. Saltzman used an alphabetic representation for the seven spectral amplitudes (A, B, …, G), (section 7 in Saltzman 1962), whereas we adopted the representation X1 to X7 for the amplitudes.

Table 3.

Seven spectral components of double Fourier series.

Seven spectral components of double Fourier series.
Seven spectral components of double Fourier series.

The following trigonometric arguments appear in the spectral forms: πax, πbx, and πcx with a=1/2, b=22/3, and c=2/6 where they respectively account for three waves, four waves, and one wave over the nondimensional 62 horizontal length of the domain, and the arguments πz and 2πz represent a half wave and a full wave over the depth of fluid where nondimensional depth to length ratio is 1/62.

Substitution of the spectral forms of solution into the Boussinesq equations in Table 4 leads to a set of coupled ordinary differential equations governing the spectral amplitudes. These equations and associated coefficients are found in Tables 5 and 6. The coefficients in Table 6 are expressed in terms of the two nondimensional numbers associated with this convection problem: Rayleigh number (λ) and Prandtl number (σ). The Rayleigh number is expressed as a constant λ multiplying the critical Rayleigh number in Saltzman’s development [=657.511 for this case of two free boundaries as derived in Chandrasekhar (1961) that followed Rayleigh’s pioneering development in 1916)]. Saltzman set σ = 10 in his study and we also set σ = 10 in our study, but only after we determined that the forecast sensitivity to σ was much less than sensitivity to λ. Determination of the relative forecast sensitivities to these parameters dictated our derivation in terms of arbitrary values of the two numbers.

Table 4.

Boussinesq equations. These governing equations neglect variations in density except where they modify the action of gravity (Malkus and Veronis 1958; Chandrasekhar 1961, chapter 2).

 Boussinesq equations. These governing equations neglect variations in density except where they modify the action of gravity (Malkus and Veronis 1958; Chandrasekhar 1961, chapter 2).
 Boussinesq equations. These governing equations neglect variations in density except where they modify the action of gravity (Malkus and Veronis 1958; Chandrasekhar 1961, chapter 2).
Table 5.

Saltzman’s (1962) model: SLOM (7) is X˙=f(X,α), where XR7, α = (λ, σ)T and f = (f1, f2, …, f7)T.

Saltzman’s (1962) model: SLOM (7) is X˙= f⁡(X,α), where X ∈ R7, α = (λ, σ)T and f = (f1, f2, …, f7)T.
Saltzman’s (1962) model: SLOM (7) is X˙= f⁡(X,α), where X ∈ R7, α = (λ, σ)T and f = (f1, f2, …, f7)T.
Table 6.

Values of coefficients Cijk in Table 5.

Values of coefficients Cijk in Table 5.
Values of coefficients Cijk in Table 5.

In this study, the true control vector is taken to be λ = 2 and Saltzman’s specified initial conditions. The forecast control vector is taken to be λ = 2.1 and Saltzman’s specified initial conditions except for temperature-departure amplitude X4(0) that is taken to be nondimensional number 0.1 instead of 0.0. It is sufficient to choose the single amplitude initial condition, different from true control, to reveal the action of the variational data assimilation process on correction of initial condition.

In addition to the spectral equations for convection two sets of equations governing the forecast sensitivities to control are derived, namely, equations governing ∂Xi/∂λ, [∂Xi/∂X4(0)] i = 1, 2, …, 7. Thus, there are 21 coupled equations where solutions to Xi(t) equations are coupled with the time evolution of the 14 sensitivity equations but where the solutions to the sensitivity equations do not feed back into the solutions to the Xi(t) equations. The graphs of amplitude variation for true and forecast control are shown in Fig. 10. Figures 11 and 12 show the (x, z) distribution of the true and forecasted vertical velocity and temperature departure variables (w, θ) at t = 0.65, the time when data assimilation operates. Note that the phase for wavenumber 3 is reversed for true and forecasted states in Figs. 11 and 12, consistent with the time evolution of amplitudes shown in Fig. 10.

Fig. 10.

Time evolution of true and forecasted spectral amplitudes.

Fig. 10.

Time evolution of true and forecasted spectral amplitudes.

Fig. 11.

The (x, z) distribution of (top) true vertical motion and (bottom) temperature departure at t = 0.65.

Fig. 11.

The (x, z) distribution of (top) true vertical motion and (bottom) temperature departure at t = 0.65.

Fig. 12.

The (x, z) distribution of (top) forecasted vertical motion and (bottom) temperature departure at t = 0.65.

Fig. 12.

The (x, z) distribution of (top) forecasted vertical motion and (bottom) temperature departure at t = 0.65.

4) The experiment

The forward operator H, a matrix of dimension 3 × 7 for this model, represents the spatial form of (u, w, θ) variables in the (x, z) space (refer to Table 7 for details). When multiplied by the 7 × 1 time-dependent amplitude matrix X = {X1, X2, …, X7} of Fourier amplitudes, the three rows of the resulting matrix represent the (x, z, t) structure of the (u, w, θ) model variables—the state of convection in space and time. When the H matrix is multiplied by the F matrix, a 7 × 2 matrix of forecast sensitivities (the two columns are amplitudes of the forecast sensitivity to the initial condition X4(0) and Rayleigh number (expressed in terms of λ), respectively, the result is a 3 × 2 matrix of the (x, z, t) representation of the six sensitivity elements—the separate first derivatives of the variables with respect to the two control parameters. In matrix form, HX is the state vector and HF is the sensitivity matrix. The square of the sensitivity matrix, (HF)T(HF), is the G matrix, and tr(G) is its trace.

Table 7.

Elements of the matrix H.

Elements of the matrix H.
Elements of the matrix H.

The variational data assimilation process will take place at t = 0.65, that point in time where initial perturbations in the Fourier convective components have started to grow significantly. The cost function J over the (x, z) space at this time is also calculated. The trace tr(G) is plotted in Fig. 13.5 A point at the maximum value of the tr(G) centroid near x = 6.0 and z = 0.7 will be considered as the location of observations of (u, w, θ)—the maximum is located at x = 5.802 and z = 0.721.

Fig. 13.

Trace of G in the (x, z) space at t = 0.65.

Fig. 13.

Trace of G in the (x, z) space at t = 0.65.

The two-component gradient of J, gradient of J with respect to λ and the gradient of J with respect to the initial condition X4(0), is found by taking the derivative of the cost function with respect to these controls separately. These derivatives can be calculated easily since the variables’ sensitivity to control are available from solution to the amplitude equations discussed earlier. The structure of these two gradients over the (x, z) space at t = 0.65 is shown in Figs. 14 and 15.

Fig. 14.

The cost function gradient with respect to initial condition at t = 0.65.

Fig. 14.

The cost function gradient with respect to initial condition at t = 0.65.

Fig. 15.

The cost function gradient with respect to λ at t = 0.65.

Fig. 15.

The cost function gradient with respect to λ at t = 0.65.

The values of the gradients at the observation point are ∂J/[∂X4(0)] = 4.893 99 × 105 and ∂J/∂λ = 4.178 00 × 105. Thus, from the operating point in the space of control [X4(0), λ] = (0.1, 2.1), The negative gradient of J is directed downward at an angle of 49.51° (slope of 1.17). With a step size of 0.150 along this direction, the adjusted control is [ΔX4(0), Δλ] = (−0.097, −0.110), close to the optimal adjustment where only a 5° difference exists between the two vectors. The adjusted forecast control [ΔX4(0), λ] = (0.003, 1.99), is very close to true control.

In summary, the tr(G) structure identified points in (x, z) space at t = 0.65 that were likely to have large values of the J gradient. When observations of the three convection variables (two velocity components and the temperature departure component) are placed at one of the centroids of the G trace map, the gradients of the cost function were found to be large and of roughly the same magnitude. The direction of the negative gradient was very close to the optimal direction. Examination of the gradient maps in Figs. 14 and 15 makes it clear that there are many locations for observation points [away from the centroids of tr(G)] that deliver gradients much smaller and in directions that are not pointed toward the J minimum compared to those associated with points near centroids.

7. Discussion and conclusions

Whereas traditional 4D-Var is grounded on use of the adjoint model to deliver the gradient of the cost function—so-called adjoint sensitivity, the alternate method developed here is founded on the role of forecast sensitivity to control and an expression that defines the forecast error in terms of observation error and the model counterpart to observations. The separability of the influence of the model and observation system as brought out in (4.1) on adjoint sensitivity presents a view that enriches the traditional view. We want to emphasize that since the error f is not known in advance, we use the fundamental relation in (4.1) to decide on the placement of the observation as described above.

The traditional 4D-Var method provides a computationally efficient framework for computing the adjoint gradient for use in forecast error correction. Within this classical framework, the prevailing practice is to use different types of observations, without much consideration to their optimal location. And as the number of observations increases, computational demands to assimilate them also increase. Our alternate approach presents an off-time strategy based on the forward sensitivity field in the control space. By selectively placing observations where the model exhibits large sensitivity, we can in fact increase the effectiveness of the observations without having to use many observations in the assimilation process.

Our approach also clearly shows that it is not simply the magnitude of the adjoint gradient, but its (orthogonal) projection along the unknown control error vector δc that is solely responsible for the reduction in the forecast error. It is shown that the magnitude of this projection is related to the inner product of the adjoint sensitivity and the unknown error in the control vector. This inner product is directly related to the Frobenius norm of the forward sensitivity of the forecast with respect to the initial conditions and parameters. By examining variations of this norm in space and time, and by placing the observations in locations at or in the vicinity of the maxima of this norm, the magnitude of the projection is increased. The effectiveness of this approach has been demonstrated with a one-dimensional air/sea interaction model (time dependence alone) and a two-dimensional model, Saltzman’s nonlinear model of Bénard (1900, 1901) convection in space and time. The richness of Saltzman’s treatment of convection so faithful to the work of Bénard and Rayleigh was a pedagogical triumph, central to Ed Lorenz’s monumental study of deterministic chaos, and central to our understanding of the interplay between dynamics and observations in data assimilation.

The strategy of choosing individual observations in the vicinity of the Frobenius norm maxima is straightforward and logical, but a methodology that improves the “granularity” (finer resolution) can be attained by the following processes: 1) in the presence of an inordinate number of Frobenius norm maxima, rank order them in terms of the value of the maximum and decide on a suitable number of observations, 2) examine the square of the norm of each of the n columns of U against time and space and identify the maxima in each field; then again by rank ordering these maxima, decide on a suitable number of observations to use, and 3) the finest level of granularity is attained when we examine the variation of the square of each of the (n + p) × m elements of U¯, identify and rank order the maxima to come up with an acceptable number of observations to use in the assimilation scheme.

We hasten to add that the process of computing the forward sensitivities involves solving time-consuming matrix recurrence equations in (3.2) and (3.3) where each step involves O(n3) operations (See Lakshmivarahan et al. 2017). There are two avenues to conquer this computational challenge. First is to invoke parallel implementation of the recurrence that defines the forward sensitivity dynamics to gain speedup. A little reflection suggests that Eqs. (3.2) and (3.3) admit “embarrassingly” parallel implementations. Once the Jacobian of the forward operators DM(k) in (3.2) are computed along the forecast trajectory (which is a common denominator in all the adjoint based approaches), we can split the matrix–matrix recurrence in (3.2) into a system of n matrix–vector recurrences split across the n columns of U(k) and implement each of these matrix–vector recurrences in parallel requiring O(n2) operations per iteration. Using this approach, computation time for this parallel implementation would indeed match that of running the tangent linear system. The second idea is to consider a multiscale, multigrid approach wherein we use a relatively small-sized coarse grid to compute the forward sensitivity matrices that are critical to decide on the location and the number of observations. Then use a finer-scale grid to assimilate the observation. Of course, we also have a choice to hybridize the multiscale, multigrid approach with parallel computing to make the overall process computationally feasible.

Acknowledgments

This paper is a condensed version of a two-part paper (Lakshmivarahan et al. 2019b; Lewis et al. 2019) submitted to J. Atmos. Sci. The challenge of combining these two papers was made manageable through the advice and the input from the editor-in-charge and valued suggestions from a large panel of reviewers. We are indebted to these individuals for helping us create a better and a readable manuscript. Preliminary ideas related to the developments in this paper were presented (by S. L.) in a seminar in July 2018 at the Center for Atmospheric and Oceanic Sciences (CAOS) and Divecha Centre for Climate Change (DCCC) at the Indian Institute of Science. We are grateful to all the participants for the interaction and to Ashwin Seshadri, J. Srinivasan, and S. K. Satheesh for the hospitability and keen interest.

APPENDIX A

Dependence of g^ on the Spectral Properties of G=FTH¯F

In this appendix we explore the effectiveness of an observation by analyzing the intrinsic dependence of the projected component, g^ of the adjoint gradient g = Gf along the control error vector f = δc. We start by enlisting some of the key properties of the linear transformation defined by G.

Property A1: Referring to the developments in section 3 and  appendix B, it follows that matrix G is a Gramian and is given by

 
G=FTH¯F=F¯TF¯R(n+p)×(n+p)
(A.1)

and is an SPSD (Meyer 2000).

Property A2: Since G is SPSD, the inner product of f and g is nonnegative. It follows from the definition of SPSD (Meyer 2000) that

 
f,g=f,Gf=fTGf=f2(f^TGf^)0,
(A.2)

where f^=f/f is the unit vector in the direction of f.

Property A3: Referring to chapter 6 in Lewis et al. (2006), the orthogonal projection matrix Pf along the span of the vector f is given by

 
Pf=f(fTf)1fT=ffTf2=f^f^T.
(A.3)

Property A4: The orthogonal projection g^ of g along f is given by Lewis et al. (2006, chapter 6)

 
g^=Pfg=f^f^Tg=f^f^T(Gf)=f(f^TGf^).
(A.4)

Property A5: Since f and f^ are fixed, the properties of the scalar

 
Ω(G)=f^TGf^
(A.5)

called the Rayleigh coefficient (Meyer 2000) depends critically on the spectral expansion of the SPSD matrix G.

Property A6: Since G is SPSD, the spectral expansion of G is given by (Meyer 2000)

 
G=QDQT,
(A.6)

where Q and D are the matrices of eigenvectors and eigenvalues of G, respectively, where QQT = QTQ = In+p. Substituting (A.6) in (A.5) and define

 
y=QTf^,
(A.7)

we get

 
Ω(G)=f^TGf^=f^T(QDQT)f^=yTDy=idiyi2,
(A.8)

where y is a unit vector since

 
y2=yTy=f^T(QQT)f^=f^Tf^=f2=1.
(A.9)

Stated in other words, the components of y are the coordinates of the unit vector f^ in the new orthogonal coordinate system formed by the columns of Q.

In view of (A.9), the Rayleigh coefficient Ω(G) has natural interpretation as the weighted average of the eigenvalues of G.

Property A7: Using (A.2) in (A.4), we obtain another natural expression for the projection g^ as

 
g^=ff,gf2=f^f^,g.
(A.10)

A geometric view of the relation between f, g, and g^ is given in Fig. 1. Referring to Fig. 1, we can resolve g as a sum:

 
g=g^+g,
(A.11)

where g^ is the orthogonal projection of g onto f and g=gg^.

The overall effectiveness of the forecast error correction strategy depends critically on g^ through the length, g^. Clearly, referring to Fig. 1, we can maximize g^ by maximizing g^ and minimizing angel θ between f and g that is equivalent to maximizing the inner product ⟨f, g⟩.

Property A8: Combining the eigen decomposition of G in (A.6) with (A.2) and (A.7), we get the following new expression:

 
g=Gf=(QDQT)f=f(QD)QTf^=f(QD)y.
(A.12)

Consequently,

 
g2=gTg=f2yTD(QTQ)Dy=f2yTD2y=f2idi2yi2,
(A.13)

where the second factor on the right-hand side of (A.13) is the weighted average of the squares of the eigenvalues of G.

Property A9: Combining (A.4), (A.6), and (A.9), we get an expression for g^:

 
g^=f(f^TGf^)=f[f^T(QDQT)f^]=f(yTDy).
(A.14)

Hence,

 
g^2=g^Tg^=f2(yTDy)2=f2(idiyi2)2,
(A.15)

where the second factor on the right-hand side of (A.15) is the square of the weighted average of the eigenvalues of G.

Property A10: Considering {yi2} as the discrete probability distribution over di, define

 
d¯=idiyi2.
(A.16)

Then, from

 
0i(did¯)2yi2=idi2yi2d¯2,
(A.17)

we obtain the natural inequality:

 
g2g^2,
(A.18)

which states that the length of the vector g is always greater than or equal to its orthogonal projection g^.

The above analysis naturally leads to the following inescapable conclusions:

Property A11: Increasing g^ has an immediate consequence of (i) increasing the effectiveness of the forecast error correction process and (ii) it increases ||g||, which in turn controls the shape of the cost functional J(c) by avoiding flat patches. Hence, the reason for the title of the paper.

As a prelude to increasing g^ from (A.15) first observe that for a given fixed f,

 
g^f=idiyi2minyi20{yi2}(idi).
(A.19)

From (A.6), exploiting the properties of the trace of product matrices (Meyer 2000), it can be verified that

 
tr(G)=tr(QDQT)=tr(D)=idi=tr(F¯TF¯)=F¯F2=jF¯*j2,
(A.20)

where ||A||F denotes the Frobenius norm of (Meyer 2000) A. Hence, by placing observations where the Frobenius norm of the forward sensitivity matrix F¯ is a maximum, we can achieve the twin objective stated in the property A11.

Property A12: From remark 4.1, it is immediate that the matrix G is also the Hessian of J(c) at the minimum c¯.

Clearly, the condition number of G controls the shape of the cost functional J(c) around its minimum.

APPENDIX B

Factorization of G=FTH¯F

Recall from (3.14) that G=FTH¯F, where F = [U, V] and H¯=DhT(k)R1Dh(k). It is assumed that the noise covariance matrix R (and hence R−1) is SPD. Accordingly, we can express R−1 as the product of its Cholesky factors (Lewis et al. 2006, chapter 9) as

 
R1=ΣTΣ,
(B.1)

where Σ is an upper triangular matrix.

Substituting, we obtain the Gramian structure

 
G=FTH¯F=(FTDhTΣT)(ΣDhF)=F¯TF¯,
(B.2)

where F¯=ΣDh(k)F.

In the first special case when F = U (which occurs when δα = 0), we get

 
G=UTH¯U=(UTDhTΣT)(ΣDhU)=U¯TU¯,
(B.3)

where U¯=ΣDh(k)U.

In the second special case when F = V [which occurs when δx(0) = 0], we get

 
G=VTH¯V=(VTDhTΣT)(ΣDhV)=V¯TV¯,
(B.4)

where V¯=ΣDh(k)V.

APPENDIX C

A Comparative Analysis

By way of comparing our approach with those in the adjoint sensitivity literature, we consider the adjoint sensitivity relation given in Eq. (8) in AH2007.

Solving Eq. (3.2) in this paper, it can be verified that the forward sensitivity matrix U(k) given by

 
U(k)=DM(k1)DM(k2)DM(1)DM(0)=DM(k1:0)
(C.1)

is indeed the same as resolvent R(k − 1, 0) in the Eq. (2) in AH2007. Consequently, Eq. (2) in AH2007, in our notation, takes the form

 
δx(k)=U(k)δx(0).
(C.2)

The induced first variation δJ in J. is then given by

 
δJ=J[x(k)+δx(k)]J[x(k)]=x(k)J,δx(k),
(C.3)

where ∇x(k)JRn is the gradient of J[x(k)] with respect to x(k)

Combining (C.2) and (C.3) and using the adjoint property, from first principles it follows that the adjoint gradient is given by

 
x(0)J=UT(k)x(k)J,
(C.4)

which is Eq. (8) in AH2007.

The analysis in AH2007 then proceeds to estimating the adjoint gradient using an ensemble approach. This is accomplished by building a linear regression between the ensemble of induced first variations {δJi|1 ≤ iM} and the ensemble of the first variation in the initial conditions {δxi(0)|1 ≤ iM}, where M is the ensemble size. For further details, refer to AH2007.

However, in this paper we take the basic relation in (C.4) a step further by explicitly estimating ∇x(k)J. From the structure of J(c) in Eq. (2.6) of this paper, it can be verified that

 
x(k)J=DhT(k)R1e(k)=DhT(k)R1{z(k)h[x(k)]}=DhT(k)R1{h[x¯(k)]h[x(k)]}=DhT(k)R1Dh(k)δx(k)=[DhT(k)R1Dh(k)]U(k)δx(0).
(C.5)

Substituting Eq. (C.5) into (C.4), we get

 
x(0)J=UT(k)[DhT(k)R1Dh(k)]U(k)δx(0)=Gδx(0),
(C.6)

which is the equation in (4.1) for the special case when x(0) is varied and α is held constant.

It is important to note that while all papers related to adjoint sensitivity analysis in the literature consider sensitivity only with respect to initial conditions, our work simultaneously deals sensitivity with respect to both initial conditions and parameters on the same footing.

REFERENCES

REFERENCES
Ancell
,
B.
, and
G. J.
Hakim
,
2007
:
Comparing adjoint- and ensemble-sensitivity analysis with applications to observation targeting
.
Mon. Wea. Rev.
,
135
,
4117
4134
, https://doi.org/10.1175/2007MWR1904.1.
Bénard
,
M.
,
1900
:
Les tourbillions cellulaires dans une nappe liquide
.
Rev. Gen. Sci. Pures Appl.
,
11
,
1261
1271
,
1309
1328
.
Bénard
,
M.
,
1901
:
Les tourbillons cellulaires dans une nappe liquid transportant de la chaleur par convection en permanent
.
Ann. Chim. Phys.
,
23
,
62
144
.
Berliner
,
L. M.
,
Z. Q.
Lu
, and
C.
Snyder
,
1999
:
Statistical design for adaptive weather observations
.
J. Atmos. Sci.
,
56
,
2536
2552
, https://doi.org/10.1175/1520-0469(1999)056<2536:SDFAWO>2.0.CO;2.
Bjerknes
,
J.
, and
E.
Palmén
,
1937
:
Investigations of selected European cyclones by ascents
.
Geofys. Publ.
,
12
,
1
62
.
Boussinesq
,
J.
,
1903
:
Théorie Analytique de la Chaleur
.
Gauthier-Villars
,
670
pp.
Burpee
,
R. W.
,
J. L.
Franklin
,
S. J.
Lord
,
R. E.
Tuleya
, and
S. D.
Aberson
,
1996
:
The impact of Omega dropwindsondes on operational hurricane track forecast models
.
Bull. Amer. Meteor. Soc.
,
77
,
925
933
, https://doi.org/10.1175/1520-0477(1996)077<0925:TIOODO>2.0.CO;2.
Chandrasekhar
,
S.
,
1961
:
Hydrodynamic and Hydromagnetic Stability
.
Clarendon Press
,
654
pp.
Cochran
,
W. G.
, and
G. M.
Cox
,
1992
:
Experimental Designs
. 2nd ed.
John Wiley and Sons
,
640
pp.
Eliassen
,
A.
,
1995
:
Jacob Aall Bonnevie Bjerknes (1897–1975): Biographical Memoir
.
National Academies Press
,
21
pp.
Evans
,
M. N.
,
A.
Kaplan
, and
M. A.
Cane
,
1998
:
Optimal sites for coral-based reconstruction of global sea surface temperature
.
Paleoceanography
,
13
,
502
516
, https://doi.org/10.1029/98PA02132.
Hakim
,
G. J.
, and
R. D.
Torn
,
2008
:
Ensemble synoptic analysis. Synoptic–Dynamic Meteorology and Weather Analysis and Forecasting, Meteor. Monogr., No. 55, Amer. Meteor. Soc., 147–161
, https://doi.org/10.1007/978-0-933876-68-2_7.
Koch
,
S.
,
M.
Fengler
,
P. B.
Chilson
,
K. L.
Elmore
,
B.
Argrow
,
D. L.
Andra
Jr.
, and
T.
Lindley
,
2018
:
On the use of unmanned aircraft for sampling mesoscale phenomena in the preconvective boundary layer
.
J. Atmos. Oceanic Technol.
,
35
,
2265
2288
, https://doi.org/10.1175/JTECH-D-18-0101.1.
Kotsuki
,
S. K.
,
K.
Kurosawa
, and
T.
Miyoshi
,
2019
:
On the properties of ensemble forecast sensitivity to observations
.
Quart. J. Roy. Meteor. Soc.
,
145
,
1897
1914
, https://doi.org/10.1002/qj.3534.
Lakshmivarahan
,
S.
, and
J. M.
Lewis
,
2010
:
Forward sensitivity based approach to dynamic data assimilation
.
Adv. Meteor.
,
2010
,
375615
, https://doi.org/10.1155/2010/375615.
Lakshmivarahan
,
S.
,
J. M.
Lewis
, and
D.
Phan
,
2013
:
Data assimilation as a problem in optimal tracking: Application of Pontryagin’s minimum principle
.
J. Atmos. Sci.
,
70
,
1257
1277
, https://doi.org/10.1175/JAS-D-12-0217.1.
Lakshmivarahan
,
S.
,
J. M.
Lewis
, and
R.
Jabrzemski
,
2017
:
Forecast Error Correction Using Dynamic Data Assimilation
.
Springer
,
270
pp., https://doi.org/10.1007/978-3-319-39997-3.
Lakshmivarahan
,
S.
,
J. M.
Lewis
, and
J.
Hu
,
2019a
:
Saltzman’s model: Complete characterization of solution properties
.
J. Atmos. Sci.
,
76
,
1587
1608
, https://doi.org/10.1175/JAS-D-17-0344.1.
Lakshmivarahan
,
S.
,
J. M.
Lewis
, and
J.
Hu
,
2019b
:
On controlling the shape of the cost functional in dynamic data assimilation: Guidelines for placement of observations—Part 1. University of Oklahoma School of Computer Science Tech. Rep., 39 pp
.
Langland
,
R. H.
, and
N. L.
Baker
,
2004
:
Estimation of observation impact using the NRL atmospheric variational data assimilation adjoint system
.
Tellus
,
56A
,
189
201
, https://doi.org/10.1111/J.1600-0870.2004.00056.X.
Langland
,
R. H.
, and et al
,
1999
:
The North Pacific Experiment (NORPEX-98): Targeted observations for improved North American weather forecasts
.
Bull. Amer. Meteor. Soc.
,
80
,
1363
1384
, https://doi.org/10.1175/1520-0477(1999)080<1363:TNPENT>2.0.CO;2.
Le Dimet
,
F. X.
, and
O.
Talagrand
,
1986
:
Variational algorithms for analysis and assimilation of meteorological observations: Theoretical aspects
.
Tellus
,
38A
,
97
110
, https://doi.org/10.1111/j.1600-0870.1986.tb00459.x.
Lewis
,
J. M.
, and
J. C.
Derber
,
1985
:
The use of adjoint equations to solve a variational adjustment problem with advective constraints
.
Tellus
,
37A
,
309
322
, https://doi.org/10.3402/tellusa.v37i4.11675.
Lewis
,
J. M.
,
S.
Lakshmivarahan
, and
S. K.
Dhall
,
2006
:
Dynamic Data Assimilation: A Least Squares Approach
.
Cambridge University Press
,
654
pp.
Lewis
,
J. M.
,
S.
Lakshmivarahan
, and
J.
Hu
,
2019
:
A criterion for choosing observation sites in data assimilation: Applied to Saltzman’s convection model—Part 2. University of Oklahoma School of Computer Science Tech. Rep., 41 pp
.
Lorenz
,
E. N.
,
1963
:
Deterministic nonperiodic flow
.
J. Atmos. Sci.
,
20
,
130
141
, https://doi.org/10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2.
Lorenz
,
E. N.
,
1993
:
The Essence of Chaos
.
University of Washington
,
227
pp.
Lorenz
,
E. N.
, and
K. A.
Emanuel
,
1998
:
Optimal sites for supplementary weather observations: Simulation with a small model
.
J. Atmos. Sci.
,
55
,
399
414
, https://doi.org/10.1175/1520-0469(1998)055<0399:OSFSWO>2.0.CO;2.
Majumdar
,
S. J.
,
2016
:
A review of targeted observations
.
Bull. Amer. Meteor. Soc.
,
97
,
2287
2303
, https://doi.org/10.1175/BAMS-D-14-00259.1.
Majumdar
,
S. J.
, and et al
,
2011
:
Targeted observations for improving numerical weather prediction: An overview. WMO Rep. WWRP/THORPEX 15, 37 pp.
, www.wmo.int/pages/prog/arep/wwrp/new/documents/THORPEX_No_15.pdf.
Malkus
,
W. V. R.
, and
G.
Veronis
,
1958
:
Finite amplitude cellular convection
.
J. Fluid Mech.
,
4
,
225
260
, https://doi.org/10.1017/S0022112058000410.
Manohar
,
K.
,
B. W.
Brunton
,
J. N.
Kutz
, and
S. L.
Brunton
,
2018
:
Data-driven sparse sensor placement for reconstruction: Demonstrating the benefits of exploiting known patterns
.
IEEE Control Syst. Mag.
,
38
,
63
86
, https://doi.org/10.1109/MCS.2018.2810460.
Meyer
,
C. D.
,
2000
:
Matrix Analysis and Applied Linear Algebra
.
SIAM
,
718
pp.
Narendra
,
K. S.
, and
A.
Annaswamy
,
2005
:
Stable Adaptive Systems
.
Dover Publications
,
496
pp.
Rayleigh
,
L.
,
1916
:
Convection currents in a horizontal layer of fluid, when higher temperature is on the underside
.
Philos. Mag.
,
32
,
529
546
, https://doi.org/10.1080/14786441608635602.
Saltzman
,
B.
,
1962
:
Finite amplitude free convection as an initial value problem—I
.
J. Atmos. Sci.
,
19
,
329
341
, https://doi.org/10.1175/1520-0469(1962)019<0329:FAFCAA>2.0.CO;2.
Tolman
,
R. C.
,
2010
:
Principles of Statistical Mechanics
.
Dover Publications
,
704
pp.
Torn
,
R. D.
, and
G. J.
Hakim
,
2008
:
Ensemble based sensitivity analysis
.
Mon. Wea. Rev.
,
136
,
663
677
, https://doi.org/10.1175/2007MWR2132.1.
For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Footnotes

1

If ARm×n, then the symmetric matrices AAT and ATA are known as Gramian of A.

2

The term adjoint sensitivity or gradient derives its name from the adjoint method used to compute it. Refer to Lewis et al. (2006, chapters 22–25) for details.

3

If f: RnR is continuously differentiable, then δf = ∇f, δx (Lewis et al. 2006, appendix C).

4

Let ARn×n, then the null space NULL(A) of the matrix A is a subspace of Rn consisting of all vectors xRn such that Ax = 0.

5

The graphic of the Frobenius norm has structure similar to trace G but much greater magnitude. The relative difference in magnitudes is not a factor in identifying locations where sensitivities are large.