## Abstract

Over the decades the role of observations in building and/or improving the fidelity of a model to a phenomenon is well documented in the meteorological literature. More recently adaptive/targeted observations have been routinely used to improve the quality of the analysis resulting from the fusion of data with models in a data assimilation scheme and the subsequent forecast. In this paper our goal is to develop an offline (preprocessing) diagnostic strategy for placing observations with a singular view to reduce the forecast error/innovation in the context of the classical 4D-Var. It is well known that the shape of the cost functional as measured by its gradient (also called adjoint gradient or sensitivity) in the control (initial condition and model parameters) space determines the marching of the control iterates toward a local minimum. These iterates can become marooned in regions of control space where the gradient is small. An open question is how to avoid these “flat” regions by bounding the norm of the gradient away from zero. We answer this question in two steps. We, for the first time, derive a linear transformation defined by a symmetric positive semidefinite (SPSD) Gramian $G=F\xafTF\xaf$ that directly relates the control error to the adjoint gradient. It is then shown that by placing observations where the square of the Frobenius norm of $F\xaf$ (which is also the sum of the eigenvalues of $G$) is a maximum, we can indeed bound the norm of the adjoint gradient away from zero.

## 1. Introduction

Judicious selection of the type (pressure, humidity, and wind, to name a few) and placement (the location within a given spatiotemporal domain of interest) of weather observations to advance understanding of a phenomenon and the concomitant practical ability to improve forecasts has a rich history in meteorology. Among the most celebrated use of weather observations to understand the ubiquitous extratropical cyclone was the fieldwork of Scandinavian meteorologists Jacob Bjerknes and Eric Palmén (Bjerknes and Palmén 1937). They organized the simultaneous release of 120 radiosondes from 11 European countries over a 2-day period that led to diagnosis of motions in the upper troposphere along a 3500-mi (5630-km) latitudinal cross section. Further, the study made it clear that the meridional gradient of the Coriolis parameter was an important parameter in the dynamics of upper waves. This revelation inspired Carl-Gustaf Rossby to derive his celebrated wave formula (Eliassen 1995).

In the spirit of the Bjerknes–Palmén study, but in our current day, the use of research aircraft to obtain thermodynamic and wind structure in the vicinity of Atlantic hurricanes through use of Omega dropwindsondes (ODWs) has gone far to improve the track forecasting of hurricanes (Burpee et al. 1996). Transmission of these data to operational prediction centers in real time permitted inclusion of these data into the operational models. The improvement in track forecasting as tested on numerous hurricanes was impressive—track error reductions from 16% to 30%. Also impressive was the hurricane forecasting experience of the Burpee team that led them to choose observation locations on the hurricane’s periphery as well as near the storm’s center where the thermodynamic and wind data were critically important as input to track forecasting. Majumdar (2016) and Majumdar et al. (2011) present an excellent review of “targeted observations,” which classified the Burpee et al. (1996) as an example of targeting based on synoptic reasoning—not only an appropriate label for this contribution but also for the earlier contribution by Bjerknes and Palmén (1937).

Starting from the mid-1990s there has been a growing body of work within the meteorological literature related to (i) the analysis of impact of initial condition and observation on a chosen scalar metric related to forecast quality (over a subdomain or the entire domain of interest) and (ii) targeted/adaptive observation aimed at finding the type of a new set of observations that will bring improvement to forecast quality, such as, reduction of analysis covariance, and (iii) when there is a large amount of correlated set of observations (as may happen with satellites or radars), there has been great interest in the so-called “thinning” and/or creating superobservations by computing, say, an average over a given subdomain and associating that observation with the centroid of the subdomain.

The techniques used in these areas may be broadly divided into two groups: either based on the adjoint approach rooted in the classical variational analysis or using one of the many ensemble implementations of Kalman filtering approach to data assimilation. While much of the literature concentrates on the choice of the suitable type of observations to improve the forecast quality, a few also consider distribution of observation to answer a similar goal.

Specifically, Berliner et al. (1999) examine the application of statistical experimental design (Cochran and Cox 1992) to adaptively decide on supplementary observations that would improve the analyses. Torn and Hakim (2008), Ancell and Hakim (2007, hereafter referred to as AH2007), and Hakim and Torn (2008) use ensemble methods for targeting observations. AH2007 also contains a comparison of the adjoint based and ensemble based approaches to observation targeting. Lorenz and Emanuel (1998), Langland et al. (1999), Langland and Baker (2004), Evans et al. (1998), Koch et al. (2018), Manohar et al. (2018), Kotsuki et al. (2019), Majumdar et al. (2011), and Majumdar (2016) contains extensive discussion of “thinning,” creating superobservations, targeted-observation strategies including adjoint sensitivity, forecast sensitivity to observations, ensemble transform filter method, and ensemble sensitivity. All these strategies are designed to improve analyses and ultimately reduce forecast error.

Our approach falls outside the strategies used in targeted/adaptive observations although the goal has strong overlap with those methodologies. It is assumed that we have observations of a given type represented by a vector **Z** in the observation space *R*^{m}. It is well known that when the model map $M$ and/or the forward operator **h** (that relates the observation to the state) are nonlinear, the cost functional may exhibit multiple minima (Lewis et al. 2006, example 24.4.1 in chapter 24). In addition to these intrinsic multiple minima that are purely dependent on the given nonlinearities in $M$ and **h**, and the covariance matrix of the observational noise, the shape of the cost functional in the control space (consisting of initial conditions and model parameters) is also critically dependent on distribution (location) of the observations in the state space. Our goal is to control the shape of the cost functional by analyzing the dependence of its gradient (with respect to the control) on the (spatiotemporal) location of the observations. Namely, force the magnitude of this gradient to be bounded away from zero outside the intrinsic local minimum.

This is achieved through a strategy that decides on observation placement in space and time but in the context of classical four-dimensional variational data assimilation (4D-Var) (LeDimet and Talagrand 1986; Lewis and Derber 1985). It is well known that the shape of the cost function as measured by its gradient in control space determines the marching of the control iterates toward a local minimum of the cost function. The iterates of the minimization algorithm can become marooned in regions where the gradient is very small. An open question is how to avoid these “flat” regions in control space by guaranteeing that the norm of the gradient of the cost function is bounded away from zero. While much of the earlier work deal with sensitivity of the cost functional only with respect to the initial conditions alone, our analysis simultaneously deals with initial conditions and parameters on the same footing.

This question is answered in two steps. First, we derive a linear transformation defined by a symmetric positive semidefinite (SPSD) Gramian^{1}$G=F\xafTF\xaf$ that maps the error in the control to the gradient of the cost functional alias adjoint sensitivity or gradient,^{2} where the component matrix $F\xaf$ is the product of 1) the forward sensitivity of the model solution, 2) the Jacobian of the forward operator and 3) the square root of the inverse of the covariance matrix of the observation noise. Refer to Fig. 1 for illustration. It is then shown that by placing observations where the square of the Frobenius norm of $F\xaf$ attains a maximum, we can indeed bound the norm of the adjoint gradient away from zero.

In section 2, we provide a short summary of the forecast error correction problem by relating the adjoint sensitivity or adjoint gradient to the innovation or forecast error. By relating this forecast error to the model’s forward sensitivity, we derive our main results in section 3. The intrinsic relation that exists between the adjoint sensitivity and the control error is further analyzed in section 4. The impact of observation placement on controlling the shape of the cost functional is analyzed in section 5. Two examples are considered in section 6: an air–sea interaction problem in time alone, and Saltzman’s (1962) more complicated nonlinear model of convection in space and time. Both examples exhibit the methodology’s avoidance of flat gradients in control space. Section 7 contains recommendations for placement of observations and concluding remarks. Appendix A contains mathematical details that are key to the analysis in sections 3–5. The Gramian structure of $G$ is developed in appendix B. In the short appendix C, we further illustrate the difference between our approach and a typical adjoint sensitivity analysis in AH2007.

## 2. Forecast error correction problem

### a. Model equations

Let **x**(*k*) = [*x*_{1}(*k*), *x*_{2}(*k*), …, *x*_{n}(*k*)]^{T} ∈ *R*^{n} be the state of a nonlinear, deterministic, discrete time dynamic model at times *k* = 0, 1, 2, …. Let ** α** = (

*α*

_{1},

*α*

_{2}, …,

*α*

_{p})

^{T}∈

*R*

^{p}be the vector of

*p*model parameters and let $M$:

*R*

^{n}×

*R*

^{p}→

*R*

^{n}denote the (one-step state transition) model map where $M$(

**x**,

**) = [**

*α**M*

_{1}(

**x**,

**),**

*α**M*

_{2}(

**x**,

**), …,**

*α**M*

_{n}(

**x**,

**)]**

*α*^{T}∈

*R*

^{n}. Let

be the given dynamical system whose solution starting from an initial state **x**(0) denotes the model forecast. Let **x**(*k*) = **x**[*k*, **x**(0), ** α**] denote the solution of (2.1). Since the pair

**c**= [

**x**

^{T}(0),

*α*^{T}]

^{T}∈

*R*

^{n}×

*R*

^{p}controls the evolution of

**x**(

*k*), it is known as the control. It is tacitly assumed that the model in (2.1) is the best known model for the process in question and is suitably parameterized to capture the inherent natural variations. Clearly, simultaneous estimation of the initial conditions and parameters in dynamic data assimilation puts this par with the goals and methods used in the “adaptive control” literature (Narendra and Annaswamy 2005).

### b. Observation

Let **z**(*k*) ∈ *R*^{m} be the vector observation at time *k* given by

where **h**: *R*^{n} → *R*^{m} and **h**(**x**) = [*h*_{1}(**x**), *h*_{2}(**x**), …, *h*_{m}(**x**)]^{T} ∈ *R*^{m} is called the forward operator that relates the model space *R*^{n} and the observation space *R*^{m}; $x\xaf\u2061(k)$ is the true but unknown state of nature at time *k* and ** ξ**(

*k*) ~

*N*[0, $R$(

*k*)] is a zero mean, temporally uncorrelated, Gaussian noise with $R$(

*k*) ∈

*R*

^{m×m}as its covariance matrix at time

*k*. It is assumed that $R$(

*k*) is known and positive definite in addition to being symmetric. In the case that the forward operator is linear, (2.2) takes the form

**z**(

*k*) = $H$

**x**(

*k*) +

**(**

*ξ**k*) where $H$ ∈

*R*

^{m×n}is a matrix that takes model output and converts it to the model’s counterpart of the observations.

### c. Assumption

It is assumed that the model map $M$ in (2.1), the forward operator **h** in (2.2) and the model solution **x**(*k*) are continuously differentiable in their arguments. That is, the following Jacobians are well defined for all **x**, ** α**, and

*k*≥ 0:

### d. Innovation/forecast error

Let **z**(*k*) be a single observation given at time *k*. The difference

is called the innovation or the forecast error. Refer to Fig. 2 for an illustration.

### e. A classification of forecast errors

For the purposes of our analysis in this paper, it is useful to classify the forecast errors into three types (Lakshmivarahan and Lewis 2010; Lakshmivarahan et al. 2017). Type 1 error is said to occur when the model is assumed to be perfect and the true control $c\xaf=[x\xafT\u2061(0),\alpha \xafT]T$ is known. In this case, $x\u2061(k)=x\xaf\u2061(k)$ and the forecast error is temporally uncorrelated and is a pure Gaussian noise that cannot be controlled. Type 2 error occurs when the model is perfect, but the control **c** differs from the true but unknown control $c\xaf$. In this case, the forecast error $e\xaf\u2061(k)$ depends only on the control error, *δ***c** = [*δ***x**^{T}(0), *δ**α*^{T}]^{T} and

Type 3 error is said to occur when the model is not perfect and the true control is not known. In this case, the forecast error is confounded by the unknown model error and the unknown control error. In this paper, we confine our analysis to the simpler case of type 2 errors. Estimation of type 3 errors is contained in Lakshmivarahan et al. (2013).

### f. Statement of the problem

The forecast error correction problem is usually recast as the minimization of the weighted sum of squared error criterion. To this end, define a cost functional *J*: *R*^{n} × *R*^{p} → $R$ given by

where ⟨**a**, **b**⟩ denotes the standard inner product. Again, we hasten to add that if prior information on either the initial condition **X**(0) or the parameter ** α** or both is available, we could add a quadratic penalty term to the cost function, which will contribute another term in the expression for adjoint sensitivity.

Since $e\xaf\u2061(k)$ depends on the forecast **x**(*k*) that in turn is a function of the control **c**, the cost functional is an implicit function of **c** through the given model Eq. (2.1). We hasten to add that since the cost functional is additive in the number of observations, without loss of generality, it is assumed that there is only one observation **z**(*k*) ∈ *R*^{m} at time *k*. This constrained minimization problem is usually solved as a strong constraint problem using the standard Lagrangian multiplier method. The resulting framework has come to be known as the adjoint method (see Lewis et al. 2006, chapters 22–24).

## 3. Fine structure of adjoint sensitivity

Instead of determining the adjoint sensitivity through use of the tangent linear model as typically done in the standard 4D-Var mechanics (see Lewis et al. 2006, chapter 24), we derive an alternate expression for *δ***x**(*k*) in terms of the forecast sensitivity to control and the associated incremental change in control.

### a. Alternate expression for δ*x**(*k*)*

*x*

The first-order Taylor expansion for *δ***x**(*k*) is given by

where the dynamics of evolution of the forward sensitivity matrices $U$(*k*) and $V$(*k*) defined in (2.3b) are given in the following (see Lakshmivarahan et al. 2017, chapters 2 and 3):

### b. An alternate expression for the forecast error $e\xaf\u2061(k)$

Using (2.2), we get an expression for the actual forecast error as

But using the expression for *δ***x**(*k*) in (2.5), a first-order approximation **e**(*k*) to $e\xaf\u2061(k)\u2009$ is given by

where the Jacobian $D$_{h}(*k*) ∈ *R*^{m×n} is given in (2.3a). Further, substituting (3.1) in (3.5) and regrouping, we get an alternate expression for **e**(*k*) as

where $A$(*k*) = $D$_{h}(*k*)$U$(*k*), $B$(*k*) = $D$_{h}(*k*)$V$(*k*).

Indeed, the correction *δ***c** to the control **c** needed to reduce the forecast error (modulo observation noise) can be readily obtained by solving a linear least squares problem (Lakshmivarahan et al. 2017, chapter 2) obtained by rewriting (3.6a) as

where $S$(*k*) = [$A$(*k*), $B$(*k*)] ∈ *R*^{m×(n+p)}.

Remark 3.1: In the 4D-Var framework, the adjoint sensitivity is computed using the actual forecast error, $e\xaf\u2061(k)$ in (3.4). Refer to chapter 24 in Lewis et al. (2006) for details. However, in this alternate approach, we used the first-order approximation **e**(*k*) in (3.5). When **h**(**x**) is linear, that is, **h**(**x**) = $H$**x**, then $e\xaf\u2061(k)=e\u2061(k)$. This alternate method for quantifying the forecast error is known as the first-order forward sensitivity method (FSM) and is closely related to the 4D-Var formulation. For details refer to Lakshmivarahan and Lewis (2010) and the monograph Lakshmivarahan et al. (2017).

### c. Fine structure of adjoint sensitivity

It can be verified that the exact first variation *δJ*_{e} in *J*(**c**) resulting from the initial variation *δ***c** in **c** is given by (see Lewis et al. 2006, chapter 24)

Replacing $e\xaf\u2061(k)$ in (3.7) by **e**(*k*), we get an approximation *δJ* to *δJ*_{e} given by

Substituting **e**(*k*), $S$(*k*), $A$(*k*), and $B$(*k*) from (3.6)–(3.7) and *δ***x**(*k*) from (3.1) and regrouping, we express *δJ* as the sum of the deterministic component, *δJ*^{d} and the stochastic component *δJ*^{n}, which are given as follows:

Define a symmetric matrix

that depends only on the observation system. Again, collecting the like terms and simplifying, it is immediate that

and

From first principles,^{3} we can readily identify that the gradients of *J*:

where

These relations clearly bring out the impact of interaction resulting from simultaneous variation of both the initial condition and the parameters and the observation noise covariance on the adjoint sensitivities.

Setting

where

and

is the outer product matrix, which inherits the Gramian structure alluded to in the introduction. Stated in other words, there exists a linear relation between *δ***c**, the error in the control to the deterministic part of the adjoint gradient, ∇_{c}*J*^{d}, and a linear relation between the observation noise ** ξ**(

*k*) and the stochastic component ∇

_{c}

*J*

^{n}. This new expression for the adjoint sensitivity clearly brings out (i) the role of the model through the forward sensitivities $U$ and $V$ on one hand, and (ii) the role of the observation system through $H\xaf=DhTR\u22121Dh$ on the other. Indeed, this separability of the effect of the model and the observation system clearly provides the “fine structure” of adjoint sensitivity that, to our knowledge, is hitherto unknown in the literature.

Remark 3.2: The standard assumption about the observation noise ** ξ**(

*k*) is that it is zero mean, temporally uncorrelated, Gaussian with $R$(

*k*) as its covariance at time

*k*; that is,

**(**

*ξ**k*) ~

*N*[0, $R$(

*k*)]. Hence, from (3.14), the expected value and covariance (Cov) are given by

and

using (3.10) and (3.13). Henceforth, following the tradition in statistical physics (Tolman 2010), we invoke the principle of mean field analysis and concentrate only on the deterministic component ∇_{c}*J*^{d} of the adjoint gradient. For simplicity in the notation, in the following, we identify ∇_{c}*J* with ∇_{c}*J*^{d}. Further, appendix C clearly brings out the difference between the approach of this paper and a typical adjoint sensitivity analysis AH2007.

## 4. A tale of two vectors: *δ*c and ∇_{c}*J*—Further analysis

An immediate import of the above analysis is the deviation of the simple but key relation in (3.14) where the matrix $FTH\xafF$, that maps the control space *R*^{(n+p)} of unknown control error vector *δ***c** into the space *R*^{(n+p)} of the adjoint sensitivity ∇_{c}*J*.

As a prelude to further analysis, let us further simplify the notation by defining the following equivalences:

Consequently, form (3.14) we obtain a fundamental relation

Remark 4.1: An estimate of the Hessian of *J*(**c**) at the minimum: It is well known that if $A$ is an SPD matrix and **b** is a vector, then ∇$Q$(**x**) = $A$**x** − **b** if and only if $Q$(**x**) = (1/2)**x**^{T}$A$**x** − **b**^{T}**x** + *d* for some arbitrary constant *d* (Meyer 2000; Lewis et al. 2006, chapter 10). From (4.1) and (4.2), we get $\u2207cJ=\u2212Gf=\u2212G\u2061(c\xaf\u2212c)=Gc\u2212b$ where $b=Gc\xaf$. Combining these two facts, we can locally approximate *J*(**c**) around $c\xaf$ by a quadratic functional given by $J\xaf\u2061(c)=(1/2)cTGc\u2212bTc$. Hence, the Hessian of $J\xaf\u2061(c)$ is given by $\u22072J\u2061(c)\u2248\u22072J\xaf\u2061(c)=G=FTH\xafF$, which by (3.18) is also the covariance of ∇_{c}*J*. Clearly, this Hessian is independent of the observation and its condition number controls the shape of *J*(**c**).

The meaning and the power of this new relation in (4.2) is captured in Fig. 1 from which it is evident that we can express **g** as the sum two orthogonal components:

where $g^$ is the orthogonal projection of **g** along **f** and $g^\u22a5=g\u2212g^$.

Thus, when we use **g** routinely in the minimization of cost functional in the 4D-Var method (Lewis et al. 2006, chapters 22–24), it is now clear from Fig. 1 that it is not **g**, but only its component $g^$ that is responsible for bringing the reduction in the value of the cost functional and hence the forecast error. An important question is: what factors affect the magnitude of $g^$? The 4D-Var framework is silent on this question. Consequently, we now introduce a new measure for the effectiveness of an observation.

Definition 4.1: An observation **z**(*k*) ∈ *R*^{m} at time *k* is said to be effective if the magnitude of the orthogonal projection $\Vert g^\Vert $ is large. If $\Vert g^\Vert $ is small, then the observation is said to be ineffective.

It turns out that $\Vert g^\Vert $ can be small due to two mutually exclusive reasons. First, taking the norm of both sides of (4.2) and using (4.3), it is immediate that

Hence, if the magnitude of the elements *G*_{i,j}, of the matrix $G$ are simultaneously small, then $\Vert G\Vert $ is small and hence $\Vert g^\Vert $ is small.

Stated in other words, if we place the observations in locations where the model is least sensitive, then from (4.4) it follows that the norm ||**g**|| is small in and around these locations the cost functional has flat patches in the control space *R*^{n+p}. Consequently, if $\Vert g^\Vert $ is small, then the gradient method will bring very little improvement to forecast quality.

There is yet another situation where $\Vert g^\Vert $ can be very small. This happens when the control error vector **f** lies either in the null space^{4} of $G$ or very close to it. In such cases, **g** is either orthogonal or nearly orthogonal to **f** and so $g^$ is very small even if $\Vert g\Vert $ is large.

It is against this backdrop, we describe a framework in appendix A that directly relates the spectral properties of the matrix $G$ and its attendant impact on $g^$.

Since **f** is unknown but a constant vector, from (A.19) we get a basic relation:

From (A.20), we can express

which is the Frobenius norm of $F\xaf$.

## 5. Impact of observation placement

In exploring the properties of the projection $g^$ and consequently the shape of the cost functional $J$(**c**) through dependence of its gradient on the forward sensitivities, we now move to analysis of expressions in (3.12)–(3.16).

Case 1:

**x**(0) is varied andis held constant.*α*In this case,*δ*=*α***0**and from (3.14)–(3.16) and appendix B,$g=\u2212\u2207x\u2061(0)J=[UTH\xafU]\delta x\u2061(0)=[U\xafTU\xaf]\delta x\u2061(0).$(5.1)Case a:

*m*≥*n*and $Rank\u2061(UTH\xafU)=n$ (Meyer 2000). Then $NULL\u2061(UTH\xafU)={0}$.From (4.5) and (4.6), it follows that$\Vert g^\Vert \Vert f\Vert \u2265minyi2\u22600{yi2}\u2211j\Vert U\xaf*j\Vert 2\Vert U\xaf*j\Vert 2,$(5.2)where $U\xaf*j$ is the

*j*th column of $U\xaf$ and**y**= (*y*_{1},*y*_{2}, …,*y*_{n})^{T}∈*R*^{n}are coordinates of*δ***x**(0) in the orthogonal basis formed by the eigenvectors of the Gramian $UTH\xafU$. In (5.2), while we do not have any control over the magnitude of $yi\u2032s$, we can maximize the first factor on the right-hand side of (5.2) by placing the observations where the sum of the squares of the norms of the columns of $U\xaf$ is a maximum.Case b:

*m*<*n*and rank $\u2061(UTH\xafU)=m$. Then $NULL\u2061(UTH\xafU)$ has dimension (*n*×*m*). For any*δ***c**in this null space,**g**≡ 0.

Again from (4.5) and (4.6), it is immediate that$\Vert g^\Vert \Vert f\Vert \u2265minyi2\u22600{yi2}\u2211j\Vert U*j\Vert 2.$(5.3)The second factor on the right-hand side of (5.3) can be maximized by placing the observations where the sum of the squares of the norms of the first

*m*columns of $U$ is a maximum.Case 2:

is varied and*α***x**(0) is held constant.In this case

*δ***x**(0) = 0 and from (3.14)–(3.16), $g=\u2212\u2207\alpha J=[VTH\xafV]\delta \alpha $ where we recall from appendix B that $VTH\xafV=V\xafTV\xaf\u2208Rp\xd7p$. Again, two cases arise depending on rank $\u2061(VTH\xafV)=p$ or rank $\u2061(VTH\xafV)<p$. In the latter case, the null space of $VTH\xafV$ has dimension greater than or equal to one and for any*δ***c**in this null space,**g**≡ 0.From (4.5) and (4.6), it can be verified that$\Vert g^\Vert \Vert f\Vert \u2265minyi2\u22600{yi2}\u2211j\Vert V\xaf*j\Vert 2.$(5.4)By placing the observations in the time interval where the sum of the squares of the norms of the matrix is a maximum, we can enforce or guarantee a nonzero lower bound on the norm of the adjoint gradient.

Case 3: Both

**x**(0) andare varied.*α*From (4.5) and (4.6), we get$\Vert g^\Vert \Vert f\Vert \u2265minyi2\u22600{yi2}\u2061(\u2211idi).$(5.6)Hence, by placing observations in the time intervals containing the maximum of the sum of the square of the norms of the columns of $F\xafT$, we bound the norm of $g^$ as far away from zero as is feasible.

As a guide to applying the theory developed above, we summarize the key steps as follows.

### Algorithm for the placement of observations

Step 1: Pick a control

**c**= [**x**^{T}(0),*α*^{T}]^{T}and run the model (2.1) forward in time to generate the forecast trajectory**x**(*k*) for*k*∈ [0,*T*] for some large*T*> 0.Step 2: Compute the Jacobians: $D$

_{$M$}(*k*), $DM\alpha \u2061(k)$ and**D**_{h}(*k*) in (2.3) along the trajectory in step 1.Step 3: Compute the evolution of the forward sensitivity fields $U$(

*k*) and $V$(*k*) using linear, time-varying dynamics in (3.2) and (3.3).Step 5: Plot the variation of one of the following norms versus time: (i) $\Vert G\Vert F2$, (ii) $\Vert G*j\Vert 2$, or (iii) $Gij2$.

Step 6: Identify small-time intervals containing the maxima in the plots generated in step 5. By sorting the values of these maxima in the decreasing order, we can then determine the number and location of the observations.

Clearly, those time intervals containing the maximum of one or more of the quantities in step 5 of the algorithm given above are the preferred locations for the placement of observations. The idea is that, if for any reason if we cannot place an observation at the maximum, we have the additional leverage of placing it close to it within a small interval containing it.

## 6. Two examples

In this section, we illustrate the theory developed in the previous sections using two examples. First is the 1D problem of air–sea interaction (Lakshmivarahan and Lewis 2010) where the control consists of initial and boundary conditions and a parameter. The second is the now famous example of the Saltzman’s 7D lower-order model for convection called SLOM (7). It is not out of place to mention that it is the initial analysis of SLOM (7) in Saltzman (1962) provided the key idea and the impetus to the discovery of the now famous Lorenz LOM (3) (Lorenz 1963) exhibiting deterministic chaos (Lorenz 1993, p. 137). Refer to the recent paper by Lakshmivarahan et al. (2019a) for more details on the relation between Saltzman (1962) and Lorenz (1963).

### a. Air–sea interaction problem: 1D model

Let *n* = 1 and *p* = 2. Let ** α** = (

*x*

_{s},

*β*)

^{T}∈

*R*

^{2}be the set of two parameters. Consider the 1D model equation:

with *x*(0) as the initial condition. The control vector **c** = [*x*(0), *α*^{T}]^{T} ∈ *R*^{3}.

This model describes the dynamics of evolution of the normalized temperature **x**(*t*) of the cold continental air as it moves with the prevailing wind over the warm ocean that is assumed to have a constant sea surface temperature *x*_{s}. Clearly, *x*_{s} denotes the boundary condition and *β* is known as the turbulent heat exchange parameter. See Lakshmivarahan and Lewis (2010) for more details.

The solution of (6.1) is given by

where *η* = *x*_{s} − *x*(0). Exact expression for the forward sensitivities *U*, *V*_{1}, *V*_{2}, the matrix $FTH\xafF$, and expressions for the components of the adjoint sensitivities are given in Table 1.

A plot of the base solution $x\xaf\u2061(t)$ of (6.1) starting from the base control $c\xaf=[x\xaf\u2061(0),\u2009\u2009x\xafs,\u2009\u2009\beta ]=\u2061(1.0,\u2009\u200911,\u2009\u20090.25)T$ is given in Fig. 3. In this and the rest of the figures, the discrete time *k* corresponds to the continuous time *t* = *k*Δ*t* where Δ*t* = 0.1. Let **c** = [*x*(0), *x*_{s}, *β*]^{T} = (2.0, 10, 0.3)^{T} be control for the forecast and the model forecast *x*(*t*) using (6.1) starting from **c** is plotted in Fig. 4. Evolution of forecast sensitivities of solution of (6.1) for the same initial condition as in Fig. 4 are given in Fig. 5.

Variation of the inner product of the time varying (row) sensitivity vector **F** in Table 1 and the constant control error vector $f=\delta c=c\xaf\u2212c$, with time is given in Fig. 6. It can be verified from Fig. 6 that at time close to *t** ≈ 4.4, the sensitivity vector **F** is orthogonal to δ**c** as indicated by the vanishing of the inner product. That is, *δ***c** at this time instant lies in the null space of the rank-one, symmetric matrix, $G=FTH\xafF$ given in Table 1. The plot of the variation of the norm of the gradient, ||**g**|| that is given in Table 1 is again illustrated in Fig. 7. Clearly, ||**g**|| vanishes at *t** ≈ 4.4 and is small in its neighborhood.

Variation of the square of the individual forward sensitivities *U*, *V*_{1}, and *V*_{2} in Table 1 with time are given in Fig. 8. Clearly *U*^{2} is a maximum close to the initial time and $V12$ attains its maximum value when *t* is large. But $V22$ attains its maximum at time *t*** ≈ (1/*k*) = 3.33 (since *β* = 0.3 is used for forecast). Hence by placing three observations at times, say at times *t*_{1} = 1, *t*_{2} = 20, and *t*_{3} = 3.3, we can capture information on the initial condition *x*(0), boundary condition *x*_{s} and the parameter *β*. Combining (4.6) with expressions in Table 1, it can be verified (assuming *σ*^{2} = 1) that $tr\u2061(G)=tr\u2061(FTH\xafF)=U2+V12+V22$. Since *U*, *V*_{1}, and *V*_{2} vary in their magnitude, to bring out the key features in the sum, a plot of the scaled version: $\u2061(1/50)\u2061(U2+V12+V22)$ against time is contained in Fig. 9. One can pick an equivalent set of three-time instances using this plot in Fig. 9 for placing observations.

We conclude this discussion by reporting the results of two experiments in Table 2. In the first experiment, four observations are chosen at times where the sensitivities are large and in the second where the sensitivities are small. The entries in the last column of Table 1 clearly illustrate the theory developed in this paper.

### b. Application to Saltzman’s Model: SLOM (7)

The strategy for numerical experiments follows the “identical twin” approach where two sets of model controls are specified—in this case of Saltzman’s spectral model, the Rayleigh number (Rayleigh 1916) and the model’s initial condition. One set is labeled “true control” and the other set, purposely made different than those for the true state is labeled “forecast control.” Variational data assimilation is used to correct the forecast control and bring it closer to true control. The theme of the experiment is to choose observation points that guarantee avoidance of “flatness” of the cost function in the space of control. The choice of these points rests on the foregoing theory that identifies a matrix and its trace. The location of these points is based on a two-part process:

#### 1) Part 1: Choice of location

##### (i) Stage 1: Controls specified

The true control $c\xaf$ and forecast control **c** are specified. The evolution of the true state and forecast state are followed for a sufficiently long time to note differences in the convective regimes (as in step 1 of the algorithm in section 5).

##### (ii) Stage 2: Determination of forecast sensitivities and associated $G$ matrix

Forecast sensitivities to control, **c** are found through solution to the augmented set of governing equations for Saltzman’s model. From these sensitivities and the forward operator [see section 6b(4) below]—the operator that determines the model counterpart to observations—the $G$ matrix and tr($G$), are determined. In the computation of $G$ and its trace, only the forward sensitivities to control and model counterpart of the observations are used, actual observations are not needed.

##### (iii) Stage 3: Observation placement

Observation locations are chosen in the (*x*, *z*, *t*) space where centroids of large values of tr($G$) are the ideal locations.

#### 2) Part 2: Subsequent data assimilation

##### (i) Stage 4: Cost function *J*

The cost function *J* is defined as the sum of squared differences between the observations and the associated forecasts at the points in time and space where observation locations are chosen (observations at that point in time are based on true control and forecasts are based on erroneous control).

##### (ii) Stage 5: Variational data assimilation

The variational data assimilation method is used to adjust control and bring forecast control *closer to true control*.

Remark 6.1: Knowledge of the sensitivities found in step 2 allows calculation of cost-functional gradient without adjoint equations—an alternate method of 4D-Var. This alternate method is used in the numerical experiments that follow.

#### 3) Governing equations

Saltzman (1962) assumed a seven-mode spectral model of convection that obeys the Boussinesq (1903) equations describing “roll” convection between two free surfaces maintained at constant temperature difference. Convection takes place in a thin layer of fluid heated from below. The convective motions take place in the *x*–*z* plane; that is, there is no variation in the *y* direction where *x* and *y* represent the horizontal Cartesian axes. The vertical coordinate *z* is perpendicular to the horizontal plane (positive upward). The momentum, mass, and thermodynamic equations can be reduced to two equations in vorticity and temperature departure—departure from the linear base-state temperature profile. These equations are found in Saltzman [1962, Eqs. (16) and (17)].

A double Fourier series form of spectral solution is assumed with three streamfunction modes and four temperature departure modes that are given in Table 3. Saltzman used an alphabetic representation for the seven spectral amplitudes (*A*, *B*, …, *G*), (section 7 in Saltzman 1962), whereas we adopted the representation *X*_{1} to *X*_{7} for the amplitudes.

The following trigonometric arguments appear in the spectral forms: *πax*, *πbx*, and *πcx* with $a=1/2$, $b=22/3$, and $c=2/6$ where they respectively account for three waves, four waves, and one wave over the nondimensional $62$ horizontal length of the domain, and the arguments *πz* and 2*πz* represent a half wave and a full wave over the depth of fluid where nondimensional depth to length ratio is $1/6\u20092$.

Substitution of the spectral forms of solution into the Boussinesq equations in Table 4 leads to a set of coupled ordinary differential equations governing the spectral amplitudes. These equations and associated coefficients are found in Tables 5 and 6. The coefficients in Table 6 are expressed in terms of the two nondimensional numbers associated with this convection problem: Rayleigh number (*λ*) and Prandtl number (*σ*). The Rayleigh number is expressed as a constant *λ* multiplying the critical Rayleigh number in Saltzman’s development [=657.511 for this case of two free boundaries as derived in Chandrasekhar (1961) that followed Rayleigh’s pioneering development in 1916)]. Saltzman set *σ* = 10 in his study and we also set *σ* = 10 in our study, but only after we determined that the forecast sensitivity to *σ* was much less than sensitivity to *λ*. Determination of the relative forecast sensitivities to these parameters dictated our derivation in terms of arbitrary values of the two numbers.

In this study, the true control vector is taken to be *λ* = 2 and Saltzman’s specified initial conditions. The forecast control vector is taken to be *λ* = 2.1 and Saltzman’s specified initial conditions except for temperature-departure amplitude *X*_{4}(0) that is taken to be nondimensional number 0.1 instead of 0.0. It is sufficient to choose the single amplitude initial condition, different from true control, to reveal the action of the variational data assimilation process on correction of initial condition.

In addition to the spectral equations for convection two sets of equations governing the forecast sensitivities to control are derived, namely, equations governing ∂*X*_{i}/∂*λ*, [∂*X*_{i}/∂*X*_{4}(0)] *i* = 1, 2, …, 7. Thus, there are 21 coupled equations where solutions to *X*_{i}(*t*) equations are coupled with the time evolution of the 14 sensitivity equations but where the solutions to the sensitivity equations do not feed back into the solutions to the *X*_{i}(*t*) equations. The graphs of amplitude variation for true and forecast control are shown in Fig. 10. Figures 11 and 12 show the (*x*, *z*) distribution of the true and forecasted vertical velocity and temperature departure variables (*w*, *θ*) at *t* = 0.65, the time when data assimilation operates. Note that the phase for wavenumber 3 is reversed for true and forecasted states in Figs. 11 and 12, consistent with the time evolution of amplitudes shown in Fig. 10.

#### 4) The experiment

The forward operator $H$, a matrix of dimension 3 × 7 for this model, represents the spatial form of (*u*, *w*, *θ*) variables in the (*x*, *z*) space (refer to Table 7 for details). When multiplied by the 7 × 1 time-dependent amplitude matrix $X$ = {*X*_{1}, *X*_{2}, …, *X*_{7}} of Fourier amplitudes, the three rows of the resulting matrix represent the (*x*, *z*, *t*) structure of the (*u*, *w*, *θ*) model variables—the state of convection in space and time. When the $H$ matrix is multiplied by the $F$ matrix, a 7 × 2 matrix of forecast sensitivities (the two columns are amplitudes of the forecast sensitivity to the initial condition *X*_{4}(0) and Rayleigh number (expressed in terms of *λ*), respectively, the result is a 3 × 2 matrix of the (*x*, *z*, *t*) representation of the six sensitivity elements—the separate first derivatives of the variables with respect to the two control parameters. In matrix form, $HX$ is the state vector and $HF$ is the sensitivity matrix. The square of the sensitivity matrix, ($HF$)^{T}($HF$), is the $G$ matrix, and tr($G$) is its trace.

The variational data assimilation process will take place at *t* = 0.65, that point in time where initial perturbations in the Fourier convective components have started to grow significantly. The cost function *J* over the (*x*, *z*) space at this time is also calculated. The trace tr($G$) is plotted in Fig. 13.^{5} A point at the maximum value of the tr($G$) centroid near *x* = 6.0 and *z* = 0.7 will be considered as the location of observations of (*u*, *w*, *θ*)—the maximum is located at *x* = 5.802 and *z* = 0.721.

The two-component gradient of *J*, gradient of *J* with respect to *λ* and the gradient of *J* with respect to the initial condition *X*_{4}(0), is found by taking the derivative of the cost function with respect to these controls separately. These derivatives can be calculated easily since the variables’ sensitivity to control are available from solution to the amplitude equations discussed earlier. The structure of these two gradients over the (*x*, *z*) space at *t* = 0.65 is shown in Figs. 14 and 15.

The values of the gradients at the observation point are ∂*J*/[∂*X*_{4}(0)] = 4.893 99 × 10^{5} and ∂*J*/∂*λ* = 4.178 00 × 10^{5}. Thus, from the operating point in the space of control [*X*_{4}(0), *λ*] = (0.1, 2.1), The negative gradient of *J* is directed downward at an angle of 49.51° (slope of 1.17). With a step size of 0.150 along this direction, the adjusted control is [Δ*X*_{4}(0), Δ*λ*] = (−0.097, −0.110), close to the optimal adjustment where only a 5° difference exists between the two vectors. The adjusted forecast control [Δ*X*_{4}(0), *λ*] = (0.003, 1.99), is very close to true control.

In summary, the tr($G$) structure identified points in (*x*, *z*) space at *t* = 0.65 that were likely to have large values of the *J* gradient. When observations of the three convection variables (two velocity components and the temperature departure component) are placed at one of the centroids of the $G$ trace map, the gradients of the cost function were found to be large and of roughly the same magnitude. The direction of the negative gradient was very close to the optimal direction. Examination of the gradient maps in Figs. 14 and 15 makes it clear that there are many locations for observation points [away from the centroids of tr($G$)] that deliver gradients much smaller and in directions that are not pointed toward the *J* minimum compared to those associated with points near centroids.

## 7. Discussion and conclusions

Whereas traditional 4D-Var is grounded on use of the adjoint model to deliver the gradient of the cost function—so-called adjoint sensitivity, the alternate method developed here is founded on the role of forecast sensitivity to control and an expression that defines the forecast error in terms of observation error and the model counterpart to observations. The separability of the influence of the model and observation system as brought out in (4.1) on adjoint sensitivity presents a view that enriches the traditional view. We want to emphasize that since the error **f** is not known in advance, we use the fundamental relation in (4.1) to decide on the placement of the observation as described above.

The traditional 4D-Var method provides a computationally efficient framework for computing the adjoint gradient for use in forecast error correction. Within this classical framework, the prevailing practice is to use different types of observations, without much consideration to their optimal location. And as the number of observations increases, computational demands to assimilate them also increase. Our alternate approach presents an off-time strategy based on the forward sensitivity field in the control space. By selectively placing observations where the model exhibits large sensitivity, we can in fact increase the effectiveness of the observations without having to use many observations in the assimilation process.

Our approach also clearly shows that it is not simply the magnitude of the adjoint gradient, but its (orthogonal) projection along the unknown control error vector *δ***c** that is solely responsible for the reduction in the forecast error. It is shown that the magnitude of this projection is related to the inner product of the adjoint sensitivity and the unknown error in the control vector. This inner product is directly related to the Frobenius norm of the forward sensitivity of the forecast with respect to the initial conditions and parameters. By examining variations of this norm in space and time, and by placing the observations in locations at or in the vicinity of the maxima of this norm, the magnitude of the projection is increased. The effectiveness of this approach has been demonstrated with a one-dimensional air/sea interaction model (time dependence alone) and a two-dimensional model, Saltzman’s nonlinear model of Bénard (1900, 1901) convection in space and time. The richness of Saltzman’s treatment of convection so faithful to the work of Bénard and Rayleigh was a pedagogical triumph, central to Ed Lorenz’s monumental study of deterministic chaos, and central to our understanding of the interplay between dynamics and observations in data assimilation.

The strategy of choosing individual observations in the vicinity of the Frobenius norm maxima is straightforward and logical, but a methodology that improves the “granularity” (finer resolution) can be attained by the following processes: 1) in the presence of an inordinate number of Frobenius norm maxima, rank order them in terms of the value of the maximum and decide on a suitable number of observations, 2) examine the square of the norm of each of the *n* columns of $U$ against time and space and identify the maxima in each field; then again by rank ordering these maxima, decide on a suitable number of observations to use, and 3) the finest level of granularity is attained when we examine the variation of the square of each of the (*n* + *p*) × *m* elements of $U\xaf$, identify and rank order the maxima to come up with an acceptable number of observations to use in the assimilation scheme.

We hasten to add that the process of computing the forward sensitivities involves solving time-consuming matrix recurrence equations in (3.2) and (3.3) where each step involves *O*(*n*^{3}) operations (See Lakshmivarahan et al. 2017). There are two avenues to conquer this computational challenge. First is to invoke parallel implementation of the recurrence that defines the forward sensitivity dynamics to gain speedup. A little reflection suggests that Eqs. (3.2) and (3.3) admit “embarrassingly” parallel implementations. Once the Jacobian of the forward operators $D$_{$M$}(*k*) in (3.2) are computed along the forecast trajectory (which is a common denominator in all the adjoint based approaches), we can split the matrix–matrix recurrence in (3.2) into a system of *n* matrix–vector recurrences split across the *n* columns of $U$(*k*) and implement each of these matrix–vector recurrences in parallel requiring *O*(*n*^{2}) operations per iteration. Using this approach, computation time for this parallel implementation would indeed match that of running the tangent linear system. The second idea is to consider a multiscale, multigrid approach wherein we use a relatively small-sized coarse grid to compute the forward sensitivity matrices that are critical to decide on the location and the number of observations. Then use a finer-scale grid to assimilate the observation. Of course, we also have a choice to hybridize the multiscale, multigrid approach with parallel computing to make the overall process computationally feasible.

## Acknowledgments

This paper is a condensed version of a two-part paper (Lakshmivarahan et al. 2019b; Lewis et al. 2019) submitted to *J. Atmos. Sci*. The challenge of combining these two papers was made manageable through the advice and the input from the editor-in-charge and valued suggestions from a large panel of reviewers. We are indebted to these individuals for helping us create a better and a readable manuscript. Preliminary ideas related to the developments in this paper were presented (by S. L.) in a seminar in July 2018 at the Center for Atmospheric and Oceanic Sciences (CAOS) and Divecha Centre for Climate Change (DCCC) at the Indian Institute of Science. We are grateful to all the participants for the interaction and to Ashwin Seshadri, J. Srinivasan, and S. K. Satheesh for the hospitability and keen interest.

### APPENDIX A

#### Dependence of $\Vert g^\Vert $ on the Spectral Properties of $G=FTH\xafF$

In this appendix we explore the effectiveness of an observation by analyzing the intrinsic dependence of the projected component, $g^$ of the adjoint gradient **g** = $G$**f** along the control error vector **f** = *δ***c**. We start by enlisting some of the key properties of the linear transformation defined by $G$.

Property A1: Referring to the developments in section 3 and appendix B, it follows that matrix $G$ is a Gramian and is given by

and is an SPSD (Meyer 2000).

Property A2: Since $G$ is SPSD, the inner product of **f** and **g** is nonnegative. It follows from the definition of SPSD (Meyer 2000) that

where $f^=f/\Vert f\Vert $ is the unit vector in the direction of **f**.

Property A3: Referring to chapter 6 in Lewis et al. (2006), the orthogonal projection matrix $P$_{f} along the span of the vector **f** is given by

Property A4: The orthogonal projection $g^$ of **g** along **f** is given by Lewis et al. (2006, chapter 6)

Property A5: Since **f** and $f^$ are fixed, the properties of the scalar

called the Rayleigh coefficient (Meyer 2000) depends critically on the spectral expansion of the SPSD matrix $G$.

Property A6: Since $G$ is SPSD, the spectral expansion of $G$ is given by (Meyer 2000)

where $Q$ and $D$ are the matrices of eigenvectors and eigenvalues of $G$, respectively, where $QQ$^{T} = $Q$^{T}$Q$ = $I$_{n+p}. Substituting (A.6) in (A.5) and define

we get

where **y** is a unit vector since

Stated in other words, the components of **y** are the coordinates of the unit vector $f^$ in the new orthogonal coordinate system formed by the columns of $Q$.

In view of (A.9), the Rayleigh coefficient Ω($G$) has natural interpretation as the weighted average of the eigenvalues of $G$.

A geometric view of the relation between **f**, **g**, and $g^$ is given in Fig. 1. Referring to Fig. 1, we can resolve **g** as a sum:

where $g^$ is the orthogonal projection of **g** onto **f** and $g\u22a5=g\u2212g^$.

The overall effectiveness of the forecast error correction strategy depends critically on $g^$ through the length, $\Vert g^\Vert $. Clearly, referring to Fig. 1, we can maximize $\Vert g^\Vert $ by maximizing $\Vert g^\Vert $ and minimizing angel *θ* between **f** and **g** that is equivalent to maximizing the inner product ⟨**f**, **g**⟩.

Property A8: Combining the eigen decomposition of $G$ in (A.6) with (A.2) and (A.7), we get the following new expression:

Consequently,

where the second factor on the right-hand side of (A.13) is the weighted average of the squares of the eigenvalues of $G$.

Hence,

where the second factor on the right-hand side of (A.15) is the square of the weighted average of the eigenvalues of $G$.

Property A10: Considering ${yi2}\u2009$ as the discrete probability distribution over *d*_{i}, define

Then, from

we obtain the natural inequality:

which states that the length of the vector **g** is always greater than or equal to its orthogonal projection $g^$.

The above analysis naturally leads to the following inescapable conclusions:

Property A11: Increasing $\Vert g^\u2009\Vert $ has an immediate consequence of (i) increasing the effectiveness of the forecast error correction process and (ii) it increases ||**g**||, which in turn controls the shape of the cost functional *J*(**c**) by avoiding flat patches. Hence, the reason for the title of the paper.

As a prelude to increasing $\Vert g^\u2009\Vert $ from (A.15) first observe that for a given fixed **f**,

From (A.6), exploiting the properties of the trace of product matrices (Meyer 2000), it can be verified that

where ||$A$||_{$F$} denotes the Frobenius norm of (Meyer 2000) $A$. Hence, by placing observations where the Frobenius norm of the forward sensitivity matrix $F\xaf$ is a maximum, we can achieve the twin objective stated in the property A11.

Property A12: From remark 4.1, it is immediate that the matrix $G$ is also the Hessian of *J*(**c**) at the minimum $c\xaf$.

Clearly, the condition number of $G$ controls the shape of the cost functional *J*(**c**) around its minimum.

### APPENDIX B

#### Factorization of $G=FTH\xafF$

Recall from (3.14) that $G=FTH\xafF$, where $F$ = [$U$, $V$] and $H\xaf=DhT\u2061(k)R\u22121Dh\u2061(k)$. It is assumed that the noise covariance matrix $R$ (and hence $R$^{−1}) is SPD. Accordingly, we can express $R$^{−1} as the product of its Cholesky factors (Lewis et al. 2006, chapter 9) as

where **Σ** is an upper triangular matrix.

Substituting, we obtain the Gramian structure

where $F\xaf=\u2009\Sigma Dh\u2061(k)F$.

In the first special case when $F$ = $U$ (which occurs when *δ*** α** = 0), we get

where $U\xaf=\u2009\Sigma Dh\u2061(k)U$.

In the second special case when $F$ = $V$ [which occurs when *δ***x**(0) = 0], we get

where $V\xaf=\u2009\Sigma Dh\u2061(k)V$.

### APPENDIX C

#### A Comparative Analysis

By way of comparing our approach with those in the adjoint sensitivity literature, we consider the adjoint sensitivity relation given in Eq. (8) in AH2007.

Solving Eq. (3.2) in this paper, it can be verified that the forward sensitivity matrix $U$(*k*) given by

is indeed the same as resolvent $R$(*k* − 1, 0) in the Eq. (2) in AH2007. Consequently, Eq. (2) in AH2007, in our notation, takes the form

The induced first variation *δJ* in *J*. is then given by

where ∇_{x(k)}*J* ∈ *R*^{n} is the gradient of *J*[**x**(*k*)] with respect to **x**(*k*)

Combining (C.2) and (C.3) and using the adjoint property, from first principles it follows that the adjoint gradient is given by

which is Eq. (8) in AH2007.

The analysis in AH2007 then proceeds to estimating the adjoint gradient using an ensemble approach. This is accomplished by building a linear regression between the ensemble of induced first variations {*δJ*_{i}|1 ≤ *i* ≤ *M*} and the ensemble of the first variation in the initial conditions {*δx*_{i}(0)|1 ≤ *i* ≤ *M*}, where *M* is the ensemble size. For further details, refer to AH2007.

However, in this paper we take the basic relation in (C.4) a step further by explicitly estimating ∇_{x(k)}*J*. From the structure of *J*(**c**) in Eq. (2.6) of this paper, it can be verified that

which is the equation in (4.1) for the special case when **x**(0) is varied and ** α** is held constant.

It is important to note that while all papers related to adjoint sensitivity analysis in the literature consider sensitivity only with respect to initial conditions, our work simultaneously deals sensitivity with respect to both initial conditions and parameters on the same footing.

## REFERENCES

*Mon. Wea. Rev.*

*Rev. Gen. Sci. Pures Appl.*

*Ann. Chim. Phys.*

*J. Atmos. Sci.*

*Geofys. Publ.*

*Bull. Amer. Meteor. Soc.*

*Hydrodynamic and Hydromagnetic Stability*

*Experimental Designs*

*Jacob Aall Bonnevie Bjerknes (1897–1975): Biographical Memoir*

*Paleoceanography*

*J. Atmos. Oceanic Technol.*

*Quart. J. Roy. Meteor. Soc.*

*Adv. Meteor.*

*J. Atmos. Sci.*

*Forecast Error Correction Using Dynamic Data Assimilation*

*J. Atmos. Sci.*

*Tellus*

*Bull. Amer. Meteor. Soc.*

*Tellus*

*Tellus*

*Dynamic Data Assimilation: A Least Squares Approach*

*J. Atmos. Sci.*

*The Essence of Chaos*

*J. Atmos. Sci.*

*Bull. Amer. Meteor. Soc.*

*J. Fluid Mech.*

*IEEE Control Syst. Mag.*

*Matrix Analysis and Applied Linear Algebra*

*Stable Adaptive Systems*

*Philos. Mag.*

*J. Atmos. Sci.*

*Mon. Wea. Rev.*

## Footnotes

^{1}

If $A$ ∈ *R*^{m×n}, then the symmetric matrices $AA$^{T} and $A$^{T}$A$ are known as Gramian of $A$.

^{2}

The term adjoint sensitivity or gradient derives its name from the adjoint method used to compute it. Refer to Lewis et al. (2006, chapters 22–25) for details.

^{3}

If *f*: *R*^{n} → $R$ is continuously differentiable, then *δ***f** = ∇**f**, *δ***x** (Lewis et al. 2006, appendix C).

^{4}

Let $A$ ∈ *R*^{n×n}, then the null space NULL(**A**) of the matrix $A$ is a subspace of *R*^{n} consisting of all vectors **x** ∈ *R*^{n} such that $A$**x** = **0**.

^{5}

The graphic of the Frobenius norm has structure similar to trace $G$ but much greater magnitude. The relative difference in magnitudes is not a factor in identifying locations where sensitivities are large.

Synoptic–Dynamic Meteorology and Weather Analysis and Forecasting,Meteor. Monogr., No. 55, Amer. Meteor. Soc., 147–161