## 1. Introduction

The classic adjoint formulations were generalized by Xu (1996a,b; 1997a) for physical processes with parameterized discontinuities. According to the recent study of Xu (1997b), in order to apply the generalized adjoint formulations to time discrete numerical models, the traditional time discretization scheme needs to be modified with the switch time determined by interpolation as a continuous function of the initial state. Otherwise, the discrete solution is not continuously dependent on the initial state and, consequently, the cost function contains zigzag discontinuities and their gradients contain delta functions. These delta functions are accurate descriptions of the local jumps of the cost function with respect to infinitesimal perturbations of the initial state, but they cannot tell the “nonlocal” variations of the cost function with respect to finite perturbations of the initial state. Since practical adjoint applications consider finite perturbations, the central problem here concerns how to estimate the coarse-grain gradient for the nonlocal variations of the cost function. This problem can be solved, as proposed in Xu (1997b), by modifying the traditional time discretization so that the computed gradient contains no delta function and can be used for nonlocal variations. This approach is clean but requires additional work in modifying the existing forward model. If the forward model is not modified, then an alternate approach is needed to compute the coarse-grain gradient. The related problems will be examined in this paper.

The paper is organized as follows. The analytical benchmark model and two types of discrete forward models are reviewed in the following section. Detailed error analyses are performed in section 3 for four types of tangent linear models derived from the two discrete forward models. Adjoint models and problems in gradient check are examined in section 4. The results are summarized with conclusions in section 5.

## 2. Review of analytical model and two types of discrete forward models

### a. Analytical model and benchmark solutions

*G*and

*F*are constant,

*H*( ) is the Heaviside unit-step function (see p. 622 of Courant and Hilbert 1962), and

*x*

_{c}is the threshold value for the parameterized process. As shown in X96a,

*H*(

*x*−

*x*

_{c}) can be replaced by

*H*(

*t*−

*τ*) for an on-switch or by

*H*(

*τ*−

*t*) for an off-switch, where

*τ*denotes the switch time.

*d*

_{t}

*δx*=

*δxH*′(

*t*−

*τ*)

*G*/

*F*[see (3.7) of X96a or (2.3a) of X97b]. As shown in X96a, the solution of this generalized tangent linear equation is the first-order approximation of the nonlinear perturbation Δ

*x*=

*x*′ −

*x,*where

*x*′ is perturbed solution obtained from (2.1) with

*x*

_{0}replaced by

*x*

_{0}+

*δx*

_{0}. The relative difference (RD) between

*δx*and Δ

*x*can be measured by

*δx*and Δ

*x*are functions of

*δx*

_{0}, RD is also a function of

*δx*

_{0}. As shown by the solid curve in Fig. 1 (with

*F*= 1,

*G*= −0.5,

*x*

_{c}= 0.38,

*x*

_{0}= 0, and

*T*= 1), RD ≈

*O*(

*δx*

_{0}), so the generalized tangent linear solution

*δx*is a valid linear approximation of the nonlinear perturbation Δ

*x.*

The conventional tangent linear equation of (2.1) is given by *d*_{t}*δx* = 0 (see case 1 in section 3a of X96a), and the solution is given by *δx* = *δx*_{0}. The relative difference (RD) between this solution *δx* = *δx*_{0} and the nonlinear perturbation Δ*x* is shown by the dashed curve in Fig. 1. Unlike the solid curve, this dashed curve is far above zero even when *δx*_{0} becomes zero, so the conventional tangent linear solution is not a valid linear approximation of Δ*x.*

*D*=

*x*−

*x*

_{ob},

*x*is the solution of (2.1), and

*x*

_{ob}is the observed value of

*x,*which is error free and given by the analytical solution of (2.1) with

*x*

_{0}= 0. Substituting the solution of (2.1) into (2.3) gives an analytical expression of the cost function [see (6.7) and Fig. 9 of X96a]. The gradient of this cost function, ∂

*J*/∂

*x*

_{0}, is plotted by the solid curve in Fig. 2. This gradient can be exactly derived by the backward integration of the generalized adjoint equation [see (3.8) of X96a or section 2 of X97b], but not by the backward integration of the conventional adjoint equation [see (3.3) of X96a]. The conventional adjoint solution is plotted by the dashed curve in Fig. 2, which is clearly different from the true gradient (solid curve). These analytical results will be used as benchmarks for the numerical experiments in the subsequent sections.

### b. FM0—Traditional time discretization of the forward model

*t*=

*T*/

*N*is the time step. Here, an on-switch is triggered at the

*m*th level immediately after the threshold condition is exceeded, and this

*m*th time level is determined by

*x*

_{m}≥

*x*

_{c}>

*x*

_{m−1}.

*D*

_{n}=

*x*

_{n}−

*x*

_{obn}and

*x*

_{obn}is observed value of

*x*at the

*n*th time level, which, as in (2.3), is error free and given by the analytical solution of (2.1) with

*x*

_{0}= 0. As explained in X97b, the FM0 solution is not continuously dependent on the initial state

*x*

_{0}(see Fig. 2 of X97b), and this causes zigzag discontinuities in

*J*

_{d}(see the dashed curve in Fig. 3). As these discontinuities cause delta functions in the gradient (see Fig. 3 of X97b), the derived gradient ∂

*J*

_{d}/∂

*x*

_{0}can be very different from the analytical one ∂

*J*/∂

*x*

_{0}.

### c. FM1—Modified time discretization of the forward model

The above problem can be solved if the traditional discretization is modified with the switch time determined by linear interpolation as a continuous function of the initial state. The modified forward model (FM1) is given in (5.1) of X97b. Since the FM1 solution is continuously dependent on the initial state *x*_{0}, the cost function *J*_{d} now is a continuous function of *x*_{0} (see the solid curve in Fig. 3). As shown in section 5 of X97b, when Δ*t* → 0, *J*_{d} → *J* and ∂*J*_{d}/∂*x*_{0} → ∂*J*/∂*x*_{0}, so ∂*J*_{d}/∂*x*_{0} is a good approximation of the analytical gradient ∂*J*/∂*x*_{0}.

## 3. Four types of tangent linear models

### a. CTLM0—Conventional tangent linearization of FM0

*δx*

_{n}

*δx*

_{n−1}

*n*

*N.*

*δx*

_{n}=

*δx*

_{0}and is very different from the nonlinear perturbation Δ

*x*

_{n}=

*x*

^{′}

_{n}

*x*

_{n}, where

*x*

^{′}

_{n}

*x*

_{0}+

*δx*

_{0}. The relative difference between

*δx*

_{n}and Δ

*x*

_{n}can be measured by

As shown by the dotted curve in Fig. 4a (where *N* = 10 and Δ*t* = 0.1), *RD*_{d} = 0 when the initial perturbation *δx*_{0} is within a small range (determined by *F*Δ*t* = 0.1) in the vicinity of zero. In this case, the initial perturbation *δx*_{0} is too small to cause the switch time to jump from one time level to the next time level (see Fig. 3 of X97b), so Δ*x*_{n} = *δx*_{n} = *δx*_{0} and *RD*_{d} = 0. When *δx*_{0} moves beyond the above small range, the switch time jumps and Δ*x*_{n} and *RD*_{d} also jump. As shown in Fig. 4a, the dotted *RD*_{d} curve jumps every time when *δx*_{0} is changed by *F*Δ*t.* According to (2.4) the jump of Δ*x*_{n} is proportional to *G*Δ*t,* and according to (3.2) the jump of *RD*_{d} is proportional to *G*Δ*t*/*δx*_{0}. When Δ*t* is small, as shown in Fig. 4b, the jumps of the dotted *RD*_{d} curve become small and densely packed. When Δ*t* → 0, the dotted *RD*_{d} curve approaches (for *δx*_{0} ≠ 0) to the analytical *RD* curve far above zero (shown by thin solid line in Fig. 4b and dashed line in Fig. 1). Thus, when |*δx*_{0}| ≥ *O*(*F*Δ*t*), the CTLM0 solution is not a valid approximation of the nonlinear perturbation obtained from FM0. When |*δx*_{0}| < *O*(*F*Δ*t*) and *RD*_{d} = 0, the CTLM0 solution is the same as the nonlinear perturbation obtained from FM0, but the latter is not a valid discrete approximation of the analytical nonlinear perturbation obtained from (2.1).

### b. CTLM1—Conventional tangent linearization of FM1

The conventional tangent linearization of FM1 yields CTLM1, which has the same form as CTLM0 in (3.1). The CTLM1 solution is also a trivial one: *δx*_{n} = *δx*_{0}. The relative difference *RD*_{d} between this CTLM1 solution and the nonlinear perturbation Δ*x*_{n} obtained from FM1 (instead of FM0) is plotted by the dashed curves in Figs. 4a,b. These dashed *RD*_{d} curves are very close to the analytical *RD* curve for the conventional tangent linearization (shown by thin solid line in Fig. 4 and dashed line in Fig. 1). That these curves are far above zero indicates that CTLM1 is not a valid tangent linear model of FM1.

### c. GTLM1—Generalized tangent linearization of FM1

The generalized tangent linearization of FM1 yields GTLM1, and the detailed formulation is given in (5.5) of X97b. The relative difference *RD*_{d} between the GTLM1 solution and the nonlinear perturbation Δ*x*_{n} obtained from FM1 is plotted by the dashed curve in Fig. 5. This dashed *RD*_{d} curve is very close to the analytical *RD* curve for the generalized tangent linearization (shown by thin solid line in Fig. 5 and thick solid line in Fig. 1). Since *RD*_{d} ≈ *RD* ≈ *O*(*δx*_{0}), the GTLM1 solution is a good linear approximation of the nonlinear perturbation obtained from FM1.

### d. GCTLM0—Generalized coarse-grain tangent linearization of FM0

*m*−1 and

*m*) can be computed by

The relative difference (*RD*_{d}) between the GCTLM0 solution and the nonlinear perturbation obtained from FM0 is plotted by the dotted curves in Fig. 5 for Δ*t* = 0.01. As shown, *RD*_{d} has large jumps when |*δx*_{0}| ⩽ *O*(*F*Δ*t*). These jumps are caused by the noncontinuous dependence of the nonlinear perturbation on the initial perturbation *δx*_{0}. The situation is similar to that in Fig. 4b, except that the jumps in Fig. 4b are rooted on a different analytical *RD* curve, which is far above zero (also see the dashed in Fig. 1). As |*δx*_{0}| increases beyond *O*(*F*Δ*t*), the jumps on the *RD*_{d} curve diminish rapidly and the curve becomes very close to the solid analytical *RD* curve. This implies that GCTLM0 can be used as a coarse-grain tangent linear model of FM0 when the concerned finite perturbation is sufficiently larger than the change of the FM0 solution caused by one step jump of the switch time, that is, |*x*^{′}_{0}*x*_{0}| ≫ *F*Δ*t* (see Fig. 3 of X97b). The adjoint of GCTLM0 can be used to compute the nonlocal gradient for the coarse-grain geometry of the cost function *J*_{d} constrained by FM0. The related problems are examined in the next section.

## 4. Adjoint models and problems in gradient check

The accuracy of a tangent linear model determines the accuracy of its adjoint. In this sense, the results presented for the four types of tangent linear models in the previous section have the following implications for their associated adjoint models. When the cost function *J*_{d} is constrained by FM1, the gradient ∂*J*_{d}/∂*x*_{0} can be accurately computed by the adjoint of GTLM1, but not by the adjoint of CTLM1. When the cost function *J*_{d} is constrained by FM0, the local gradient ∂*J*_{d}/∂*x*_{0} can be computed by the adjoint of CTLM0 if the initial state *x*_{0} is not at a discontinuous point of the cost function *J*_{d}(*x*_{0}). The gradient computed by this adjoint, however, is local and does not describe the coarse-grain geometry of the cost function constrained by FM0. On the other hand, the gradient computed by the adjoint of GCTLM0 is “nonlocal” and describes the coarse-grain geometry of the cost function constrained by FM0. For the example considered in this paper, the adjoint of GCTLM0 is exactly the same as the adjoint of GTLM1, so their solutions give the same gradient—the gradient of the cost function constrained by FM1. This cost function (shown by the solid curve in Fig. 3) now is treated by the adjoint of GCTLM0 as the coarse-grain geometry of the cost function constrained by FM0 (shown by the dashed curve in Fig. 3).

*u*= (∂

*J*

_{d}/∂

*x*

_{0})|∂

*J*

_{d}/∂

*x*

_{0}|

^{−1}and 0 <

*α*≪ 1. When

*α*is small but not as small as the machine round-off error, one should expect that Φ(

*α*) = 1 +

*O*(

*α*), and the result should not be very much dependent on

*x*

_{0}. This, however, is not the case for the gradient computed by the adjoint of CTLM0. As shown in Fig. 6, when the gradient is computed at

*x*

_{0}= 0.1 by the adjoint of CTLM0 with Δ

*t*= 0.001, the Φ(

*α*) curve (solid) is closely along 1 over the wide range of 10

^{−12}<

*α*< 4 × 10

^{−3}, but drops suddenly as

*α*increases to 4 × 10

^{−3}. This problem is caused by the noncontinuous dependence of the FM0 solution on the initial condition

*x*

_{0}+

*αu.*There is no jump in the FM0 solution until

*α*reaches 4 × 10

^{−3}. Once

*α*increases to 4 × 10

^{−3}, the switch time jumps to the next discrete time level, and thus the solid Φ(

*α*) curve drops suddenly. When the time step becomes as small as Δ

*t*= 0.000 01, the Φ(

*α*) curve (dotted) drops even as early as

*α*just increases to 5 × 10

^{−5}. The switch time is affected not only by

*α*and Δ

*t*but also by the initial state

*x*

_{0}. When

*x*

_{0}is very close to a discontinuous point of

*J*

_{d}(

*x*

_{0}), a very small

*α*can cause the switch time to jump. For example, when

*x*

_{0}= 0.129 998 and Δ

*t*= 0.001, the Φ(

*α*) curve (dashed line in Fig. 6) drops dramatically even as

*α*just increases to 10

^{−5}. Thus, depending on the reference initial state

*x*

_{0}, the gradients computed by the adjoint of CTLM0 can give very different Φ(

*α*) curves.

The gradient computed by the adjoint of GTLM1 can be checked by (4.1) to great precision, and the result is shown by the solid Φ(*α*) curve in Fig. 7. This result is independent of *x*_{0}. The gradient computed by the adjoint of GCTLM0, however, cannot be accurately checked by (4.1). As shown by the dashed curve in Fig. 7, Φ(*α*) (computed with Δ*t* = 0.001) is significantly larger than 1 when *α* < 10^{−2}. However, when *α* is between 10^{−2} and 5 × 10^{−1}, this dashed Φ(*α*) curve is much closer to 1 than the three Φ(*α*) curves in Fig. 6. Clearly, there is a finite range of *α* in which the Φ(*α*) curve computed by the adjoint of GCTLM0 is much closer to 1 than that computed by the adjoint of CTLM0. This finite range expands leftward (to 5 × 10^{−5}) as Δ*t* decreases (to 0.000 01), as shown by the dotted Φ(*α*) curve in Fig. 7. When the initial perturbation *δx*_{0} is in this finite range, the variation of *J*_{d}, that is, Δ*J*_{d} = *J*_{d}(*x*_{0} + *αu*) − *J*_{d}(*x*_{0}), can be well estimated by using the adjoint of GCTLM0 but not the adjoint of CTLM0. Thus, the former can be considered as a coarse-grain adjoint for FM0 and used to compute the coarse-grain gradient of the zigzag-discontinuous cost function constrained by FM0. This coarse-grain property is tied up with the coarse-grain property of GCTLM0 illustrated in the previous section.

## 5. Conclusions

When parameterized on/off switches are triggered at discrete time levels by a threshold condition in a numerical model, the switch time and thus the model solution are not continuously dependent on the initial state. This causes problems in tangent linearization of the discrete model and in computations of the cost function gradient (since the cost function contains small zigzag discontinuities). As shown in X97b, the problems can be solved by modifying the traditional time discretization. As a supplement of X97b, this study shows that the problems can be avoided by introducing coarse-grain tangent linearization and adjoint without modifying the traditional time discretization. The new results obtained in this paper are summarized as follows:

A coarse-grain tangent linear model can be derived by directly discretizing the analytical form of the generalized tangent linear equation without modifying the traditional discretization in the forward model. The coarse-grain tangent linear solution can be a valid approximation of the nonlinear perturbation obtained from the forward numerical model as long as the initial perturbation is sufficiently large to move the switch time through a large number of time levels (but not too large to cause severe nonlinearity).

The adjoint of the coarse-grain tangent linear model is a coarse-grain adjoint model. Both the coarse-grain adjoint and the conventional adjoint have problems with gradient check [see (4.1)], but the problems occur over different ranges of perturbation amplitude (see Figs. 6–7). When the time step is sufficiently small (Δ

*t*≪ | −*x*^{′}_{0}*x*_{0}|/*F,*see Fig. 3 of X97b), the coarse-grain adjoint model can be used to compute the coarse-grain gradient of the zigzag-discontinuous cost function constrained by the traditionally discretized model.

## Acknowledgments

This work was supported by NOAA Grant NA67Jo150 and NSF Grant ATM-9417304 to CIMMS and NSF Grant ATM91-20009 to CAPS at the University of Oklahoma and by the Office of Naval Research (ONR), Program Element 0602435N to the Marine Meteorology Division of the Naval Research Laboratory at Monterey.

## REFERENCES

Courant, R., and D. Hilbert, 1962:

*Methods of Mathematical Physics.*Vol. II. Interscience, 830 pp.Xu, Q., 1996a: Generalized adjoint for physical processes with parameterized discontinuities. Part I: Basic issues and heuristic examples.

*J. Atmos. Sci.,***53,**1123–1142.——, 1996b: Generalized adjoint for physical processes with parameterized discontinuities. Part II: Vector formulations and matching conditions.

*J. Atmos. Sci.,***53,**1143–1155.——, 1997a: Generalized adjoint for physical processes with parameterized discontinuities. Part III: Mutiple threshold conditions.

*J. Atmos. Sci.,***54,**2713–2721.——, 1997b: Generalized adjoint for physical processes with parameterized discontinuities. Part IV: Problems in time discretization.

*J. Atmos. Sci.,***54,**2722–2728.