## 1. Introduction

The goal of numerical weather prediction is to produce the most accurate forecasts possible. Once the equations are chosen and written in a form appropriate for computer solution, a global forecast depends on the initial conditions and model parameters. Four-dimensional variational data assimilation (4DVAR) chooses some or all of these input variables objectively. With 4DVAR, one begins by choosing a nonnegative, typically quadratic in terms of the model state, forecast error function (also called a cost function, usually chosen as a typically quadratic form of the model state). Then one applies variations of calculus to minimize the cost function by varying the input variables (also called control variables). The forecast model (called the assimilation model) serves as a strong constraint within the four-dimensional verification space of time and three-dimensional physical or Fourier-transformed space.

**x**(

*t*

_{i}) =

**x**

_{0}) and observations

**y**

_{obs}in the time window between

*t*

_{0}and

*t*

_{0}+

*t*

_{R}, where

**x**

_{0}includes the model state variable at the initial time

*t*

_{0}and model parameters,

*t*from an initial condition at time

*t*

_{0}(

*t*>

*t*

_{0}),

**y**

_{obs}is a vector of observations, R is an observational error covariance matrix, and

*J*

_{o}is a continuously differentiable function of the control variables. Then a perturbation analysis of

*J*

_{o}yields

*δJ*

_{o}

*J*

_{o}

^{T}

*δ*

**x**

_{0}

*J*

_{o}is a column vector containing the gradient of the observational term of the the cost function with respect to the control variables and

*δ*

**x**

_{0}is a perturbation of the control variables. Determination of the sensitivity of a forecast response to variations in initial conditions, or the iterative determination of initial conditions that gradually reduces the values of a cost function, requires knowledge of the gradient of the forecast response or the cost function with respect to the control variables.

*J*

_{o}can be found through a backward integration from

*t*

_{R}and

*t*

_{0}of the adjoint model, that is,

**x**

_{0}is the tangent linear model (TLM) operator of the original forecast model (called forward nonlinear model), and 𝗠

^{T}is the adjoint model operator. Based on the same assumption of differentiability of the model solution and

*J*

_{o}, the perturbation analysis provides two procedures to check the correctness of the tangent linear model and the gradient of

*J*

_{o}(Courtier and Talagrand 1987; Navon et al. 1992). Once the values of

*J*

_{o}and the gradient of

*J*

_{o}, ∇

*J*

_{o}, with respect to the control variables, can be calculated, as well as the total cost function

*J*and its gradient ∇

*J*, an unconstrained minimization algorithm (Liu and Nocedal 1989; Zou et al. 1993a) can be employed to find the “optimal” values of the control variables that minimize

*J*.

When the assimilation model is diabatic, containing parameterized physics to account for the effects of the subgrid-scale processes (e.g., turbulence and cumulus convection) on the grid-resolvable scale state of atmosphere, *J*_{o} is discontinuous and nondifferentiable. These subgrid-scale physical processes in the atmosphere modify the local budget of mass, momentum, and energy and appear in the governing equations as source/sink terms. Usually, these processes depend strongly on local atmospheric conditions, that is, on the state in an individual grid-box column. These local conditions control “on–off” switches of the physical parameterizations. The on–off switches are governed by values of certain control variables or quantities derived from the control variables. If such a variable reaches a threshold value, these source/sink terms are turned on or off at a particular time step at a grid point.

Two examples are shown in Fig. 1, where *J*_{o} (denoted as *J*_{model}) is defined as a weighted sum of the 6-h forecast errors [see Eq. (21)]. For simplicity, the time filter coefficient (*α*_{A}) and the horizontal diffusion coefficient (*α*_{HD}) are chosen as the input control variables of *J*_{model} using an adiabatic (upper panels) and a diabatic (lower panels) National Centers for Environmental Prediction (NCEP) global spectral model, respectively. The values of *J*_{model} are calculated at an interval of 0.001 for *α*_{A} and 0.001 × 10^{16} for *α*_{HD}. The variation of *J*_{model} in the adiabatic case (Figs. 1a,c) shows a very smooth behavior in the control variable space, while the variation of *J*_{model} in the diabatic case is obviously nonsmooth.

Should we use the diabatic assimilation model in 4DVAR? A diabatic forecast model including parameterized physics usually simulates the evolution of the atmosphere better than an adiabatic one does. Theoretically, a more realistic assimilation model will produce better 4DVAR results than an adiabatic model if real observations are assimilated, not to mention the necessity of including model physics to assimilate some physical quantities such as clouds and precipitation. Therefore, efforts have been made to incorporate parameterized physics into 4DVAR experiments using primitive equation models (Zupanski 1993; Xiao et al. 2000; Zou et al. 1993b; Zou and Kuo 1996; Zhu and Navon 1999; Mahfouf and Rabier 2000). Numerical results in these studies showed not only reasonably convergences of minimization procedures in which differentiable minimizations are used, but also various levels of improvement in analyses and model forecasts. In these experiments, some researchers retain the on–off switches of parameterizations in the tangent linear and adjoint models, as in the forward nonlinear model (Zou et al. 1993b; Zou and Kuo 1996; Zou 1997; Xiao et al. 1999; Mahfouf and Rabier 2000, hereafter referred to as “classical” adjoint method). Others (Zupanski 1993; Zhu and Navon 1999) attempted to remove the on–off switches using a transitional smooth function to make the model solution and the cost function smooth. In contrast, Xu (1996a) demonstrated that the classical adjoint method fails to correctly evaluate the gradient of a cost function that contains on–off switches using a perturbation analysis starting from a simple analytic model with a stepfunction source term. He proposed a generalized adjoint formalism that accounts for the effect of on–off switches in parameterized physical processes (Xu 1996a,b). The generalized adjoint was then extended to a vector system (Xu 1996b), as well as to multiple threshold conditions (Xu 1997a) and the time discretization situation (Xu 1997b). Furthermore, a “coarse-grain” tangent linearization and adjoint was proposed to deal with on–off switches triggered at discrete time levels (Xu et al. 1998). Numerical results on minimization are presented using the Lorenz-63 model (Xu et al. 1999). However, implementing their approach in data assimilation using a primitive equation model including complicated physical parameterization schemes seems difficult and requires further study.

In this study, we indicate that the cost function containing discontinuous parameterizations is *zeroth-order* discontinuous (e.g., the function itself is discontinuous due to on–off switches in the forward model), and classical adjoint method provides the one-sided gradient of a cost function containing discontinuous parameterizations. This one-sided gradient, the limit of the ratio of vanishingly small perturbations of the cost function and control variable, can be used to construct a subgradient in a nondifferentiable optimization algorithm such as the bundle method (Lemarechal and Sagastizabal 1997; Zhang et al. 2000) for improving the minimization of discontinuous functions. We examine the behavior of such a discontinuous cost function in phase space. With the theoretical and numerical results from an adjoint model integration using the classical adjoint method, insights into the following questions are gained.

What does the tangent linear model solution fail to represent in the presence of on–off switches?

What does the result of an adjoint model integration represent, or fail to represent, in the presence of on–off switches?

How can the correctness for both the tangent linear and adjoint models be checked in the presence of on–off switches?

Why did a differential minimization algorithm still work well in many 4DVAR experiments that used a diabatic assimilation model with discontinuous physics?

This paper is arranged as follows. Section 2 uses an idealized simple model with a typical discontinuous source term to examine the behavior of the tangent linear and adjoint model solutions. A chain rule method is used to derive the gradient of a cost function at a continuous point, or a one-sided gradient at a discontinuous point, through an adjoint model integration. In section 3, a one-type cloud Arakawa–Schubert cumulus parameterization scheme employed in the NCEP global spectral model is used to investigate features of the forward model, cost function, TLM, and the adjoint model. Numerical results of the gradient, calculated using the adjoint of the NCEP diabatic model, are examined in section 4. Section 5 discusses the correctness check of the adjoint of the forward operator containing discontinuous physical processes. Summary and discussions are given in section 6.

## 2. Zeroth-order discontinuous cost function and its one-sided gradient calculation using classical adjoint

### a. Zeroth-order discontinuous cost function

**represents a parameter vector that consists of physical parameters (such as the moistening parameter**

*β**b*in the Kuo scheme) and/or parameters introduced for computational stability (such as the filter coefficient). Numerical prediction of the state variable

**x**is produced by numerical integration of a forecast system, which results from expanding the governing equations (4) onto a mesh system. When 𝗙(

**x**,

**) involves no discontinuous process (e.g., an adiabatic model), both the model solution and the cost function defined by the model solution are differentiable with respect to control variables**

*β***x**and/or

**.**

*β*Consistent with the previous work (Xu 1996a,b), when the source/sink terms defined by parameterized physics are included in the assimilation model, 𝗙(**x**, ** β**) of (4) contains on–off switches and is differentiable only between two neighboring switches. The model operator 𝗙(

**x**,

**) is thus piecewise differentiable. In this section, we first ensure that the model solution, which is a result of many timestep integrations and the cost function defined on it are zeroth-order discontinuous, that is, it is the function itself that is discontinuous. In other situations, the cost function may be continuous, but with first- or high-order discontinuous derivatives. For simplicity, without changing the essence of the problem, we use a simple single-variable piecewise differentiable model to examine the behavior of the model solution in the time domain and a cost function defined by this model in the phase space.**

*β**f*

_{1}(

*x*) and

*f*

_{2}(

*x*) are different (i.e.,

*f*

_{1}(

*x*) ≠

*f*

_{2}(

*x*) when

*x*≥

*x*

_{c}) and both are differentiable functions. The similar stepfunction was used in the “generalized” adjoint study (Xu 1996a). The (classic) tangent linearized version of (5) takes the form of

*f*

^{′}

_{1}

*f*

^{′}

_{2}

*f*

_{1}and

*f*

_{2}with respect to

*x*, respectively. Starting from an initial value

*x*

_{0}<

*x*

_{c}and a perturbation Δ

*x*

_{0}that crosses

*x*

_{c}(i.e.,

*x*

_{0}+ Δ

*x*

_{0}≥

*x*

_{c}) with a forward time integration scheme, the tangent linear and nonlinear changes of the perturbation through one time step integration are, respectively,

*x*

^{TL}

_{1}

*x*

^{NL}

_{1}

*x*

_{0}crosses a threshold at the first time step, no matter how small it is. Does this mean that the classical adjoint method fails to evaluate the gradient of the cost function containing discontinuous physics? In order to answer this question, we use the chain rule method to illustrate that an adjoint model integration provides a one-sided gradient at a discontinuous point of the cost function.

*x*after two time step integrations, one has four possible values for the model state at time

*t*

_{0}+ 2Δ

*t*:

As the *n*-step time integration proceeds for this one threshold situation, the number of possible switches that could occur increases exponentially by the law of 2^{n}. If the number of thresholds increases (say, *k*), the number of possible switches that were turned on and off will increase by the law of 2^{kn}. The model state between two neighboring switches is differentiable. We indicate that in many situations, however, states at nearby grid points will be mutually compatible, and the effective number of possible different states will be smaller than 2^{kn}.

A cost function defined as a continuous transformation of the model state in various times [usually a quadratic form such as the one in (11)] is also piecewise differentiable.

*J*

_{n}

*x*

_{0}

*x*

^{2}

*t*

_{n}

*t*

_{n}=

*t*

_{0}+

*n*Δ

*t*. Using

*t*

_{0}= 0 and Δ

*t*= 0.1, we obtain the distributions of

*J*

_{n}in the control variable (

*x*

_{0}) space (Fig. 2). It is found that the cost functions consist of differentiable pieces between the discontinuity points. The longer the assimilation window (the more time integration steps), the more discontinuous points the cost function contains.

*a*is a scalar function (which may also be a function of the model input) of the model input and

*a*

_{c}is a threshold value of

*a*.

Here, *a* = *a*_{c} defines the locus of the discontinuities in input space. That locus will normally be a “hypersurface” with dimension *m* − 1, where *m* is the dimension of the input space (this picture may be complicated by various facts, and in particular by the fact that the functions *a* and *a*_{c}, because of prior on–off processes, may themselves be discontinuous functions of the input). But if we accept that the locus of the discontinuities resulting from each individual on–off process is an (*m* − 1)-dimensional hypersurface, the hypersurfaces defined by all such processes in the model will partition the input space into *m*-dimensional cells, within each of which the cost function will be continuously differentiable. Therefore, we say, the zeroth-order discontinuous cost function defined on parameterized physics is piecewise differentiable.

### b. Calculation of one-sided gradient of a zeroth-order discontinuous cost function by classical adjoint approach

**x**

_{0}, which is located within the certain cell, by the chain rule as

*n*is the dimension of the model state, and

**x**

_{0}= (

*x*

_{01},

*x*

_{02}, … ,

*x*

_{0n})

^{T}and

**x**= (

*x*

^{t}

_{1}

*x*

^{t}

_{2}

*x*

^{t}

_{n}

^{T}represent the initial condition and the model state at time

*t*, respectively. Here 𝗠 is a linear propagator derived from differentiating 𝗙(

**x**,

**), which consists of a product of**

*β**t*

_{R}differentiable subpropagators represented by each step time integration where

*t*

_{R}denotes the total number of time step integrations, that is,

_{tR}

_{tR−1}

_{1}

_{t}can be evaluated in individual cell. Therefore, we have

Figure 3 displays the variations of the gradients of the cost functions defined in (11) using the model (10) (the cost functions were shown in Fig. 2). These gradients, *dJ*_{n}/*dx*_{0}, are calculated by the chain rule as in (12) for a one-dimensional control variable situation, in which the adjoint model of (10) is developed using the classical adjoint method, that is, keeping the on–off switches the same as in the nonlinear model. The gradients evaluated from the classical adjoint integration are found to be identical to the direct analytical solution.

The derivations and numerical examinations shown above suggest that the adjoint accuracy to evaluate the gradient of the cost function is identical to the accuracy of linearization that is conducted on the forward model for building the tangent linear model. In other words, wherever the tangent linearization is conducted, the adjoint is unambiguously defined. At either side of a discontinuity point, the adjoint approach numerically determines the one-sided gradient of the cost function as accurately as the linearization is conducted at either side. This is consistent with some theoretical understanding (Errico 1997). In practice, for a continuous function, one often uses a linear change along the tangent direction to approximately describe a finite nonlinear perturbation, that is, the tangent linear approximation. The analysis above shows that the accuracy of the calculated one-sided gradients has nothing to do with the invalidity of the tangent linear approximation for a finite nonlinear perturbation that crosses the discontinuity point, that is, Δ*J* ≠ 𝗠Δ**x**, and is irrelevant to the magnitude of the finite perturbations Δ**x**. The adjoint approach deals with vanishingly small perturbations, while the tangent linear approximation deals with finite perturbations. The degree of accuracy of the tangent linear approximation says something about the applications for which a numerically computed gradient can be useful, but that is a distinctly different question.

Therefore, the classical adjoint approach does not work for finite perturbations in a nonlinear and discontinuous system, which is for computing finite difference ratios of the form Δ*J*/Δ*x*_{i}, for a given *J* defined by a discontinuous model solution. In order to numerically determine such finite difference ratios in a discontinuous nonlinear system, one explicitly computes the perturbations Δ*J* for each input perturbation Δ*x*_{i}. However, to deal with the spike of Δ*J*/Δ*x*_{i}, some new approach such as the generalized adjoint (Xu 1996a,b; 1997a,b) has to be considered.

## 3. Numerical results using a one-type cloud Arakawa–Schubert cumulus parameterization

Sensitivity examinations of various parameterizations in the NCEP model showed that a simplified (one-type cloud) Arakawa–Schubert (hereafter referred to as OTC-AS) cumulus parameterization scheme (Pan and Wu 1995) causes the largest discontinuity of the model response (Zhang 2000). We therefore choose the OTC-AS cumulus parameterization scheme as a discontinuous model operator to investigate the features of the TLM and adjoint model solutions.

### a. Typical discontinuities

The original Arakawa–Schubert cumulus parameterization scheme (Arakawa and Schubert 1974) describes an ensemble of all-type clouds. In the implementation of the NCEP global model, the scheme was simplified as a one-type cloud (Pan and Wu 1995). In the simplified scheme, a simple cloud model is used to describe the thermodynamical properties, as illustrated schematically by Fig. 4. The one-type cloud definition includes the thickness of the cloud/subcloud (updraft) layer that is greater/less than 150 hPa. This cloud/subcloud layer is determined by the vertical profiles of moist static energy and/or saturated moist static energy. Conditional instability defines the originating height of updraft (entrainment), where moist static energy reaches its maximum in the lower troposphere. At or above this height, a disturbed moist air parcel ascends adiabatically, due to buoyancy, until saturated where the cloud base is defined by lifting condensation level. Afterward, the moist air parcel will continuously rise along a constant moist static energy line (constant *θ*_{se} line), assuming there is no heat exchange between the air parcel and the environment. The cloud top is defined when the vapor in the air parcel condenses completely.

Fillion and Errico (1997) studied the accuracy of linearized convection operator and the resulting impact on minimization using a column model relaxed Arakawa–Schubert scheme. Our study examines the features of the solutions of the tangent linear model and the adjoint model of the discontinuous OTC-AS parameterization. Generally speaking, any “if” statement in a parameterization scheme is able to cause discontinuities in model response. In the OTC-AS parameterization, obvious discontinuities occur when the following physical considerations are implemented in the computer code.

The convection process is turned on/off as the restriction of the thickness of the cloud layer and/or subcloud layer is reached/broken.

The convection process is turned on/off as the cloud work functions cross their critical values.

The level indices representing the cloud/subcloud base/top heights are shifted up/down if a small change occurs in temperature and moisture profiles.

### b. Features of the nonlinear and tangent linear solutions

Figure 5 exhibits the numerical results of case 1 in which only the temperature at the second model level from the ground (*σ* = 0.9821) of a single column located at and latitude 1.875°N and longitude 87°E [corresponding to model grid (43, 47) in a 62-wavenumber triangle truncation spectral model] is varied using an interval of 0.005 K. The other input variables (specific humidity, wind speed, vertical velocity, and the surface pressure) are kept unperturbed. The model solution, which is the adjusted temperature increment resulting from the OTC-AS parameterization at this model level, has two discontinuous points around *T* = 300.15 K and *T* = 300.7 K, where the OTC-AS cumulus convection process is turned on and off, respectively (Fig. 5a). Using 0.005 K as a small perturbation of the input temperature, we examine the features of the nonlinear and tangent linear perturbation solutions (Figs. 5b, 5c) resulting from a small perturbation made to each input temperature shown in Fig. 5a. The perturbations in Figs. 5b and 5c are identical, except at the discontinuity points. Figure 5a shows that the adjusted temperature varies linearly in effect between the discontinuity points, so that the tangent linear model solution is nearly constant. At discontinuity points, Fig. 5b (nonlinear model solution) shows peaks in perturbations that are absent from Fig. 5c (TLM solution). These peaks correspond to cases where the unperturbed and perturbed input temperatures lie on different sides of the discontinuity. The tangent model is, of course, unable to produce the corresponding jumps in the adjusted temperature. However, except at the discontinuous points, the tangent linear model is as good for computing perturbations as the full model (with the first-order accuracy).

**T**=

**T**

_{r}−

**T**

_{u}and Δ

**q**=

**q**

_{r}−

**q**

_{u}. The subscripts

*u*and

*r*, respectively, denote the “unperturbed” state and “reference” state of temperature and specific humidity profiles. Again, the other input variables (wind speed, vertical velocity, and the surface pressure) are kept unperturbed. Once the unperturbed state and reference states are chosen, the parameter

*α*is varied to control the magnitude of the input perturbations

*α*Δ

**T**and

*α*Δ

**q**so that a multidimensional problem is converted to a one-dimensional problem that can easily be examined graphically.

*R*

_{1},

*R*

_{2}, and

*R*

_{3}, are defined using

*l*

_{2}Euclidean norm to represent the temperature and moisture increments due to cumulus convection (

*R*

_{1}), the nonlinear perturbation of the output temperature and moisture variables caused by a small change in the input variables (

*R*

_{2}), and the tangent linear solution (

*R*

_{3}), respectively:

^{−4}K

^{−2}and 10

^{6}kg

^{2}kg

^{−2}, corresponding to typical orders of simulated temperature and specific humidity errors of 1°K and 1 g kg

^{−1};

*R*

_{2}

*α*

*α*

^{2}

*α*) = ∂

*α*)/∂

**x**is the tangent linear operator of the OTC-AS parameterization and Δ

**x**= (Δ

**T**, Δ

**q**)

^{T}.

The features of various model solutions with respect to the multidimensional control variable can now be illustrated in a single variable space of *α*.

For the numerical test conducted below, we choose a vertical column where the convection process is inactive [the model grid (44, 47) at 1.875°N, 87°E] as a basic state (**T**_{u} and **q**_{u}), and another vertical column where the convection process is active [the model grid (43, 47) at 1.875°N, 85°E] as a reference state (**T**_{r} and **q**_{r}). We could predict that the nonlinear model response has at least one discontinuous point between *α* = 0 and *α* = 1 when the convection process is turned on.

Figure 6a shows the distribution of *R*_{1} with respect to *α* calculated at an interval of 0.01 (equivalent to a temperature interval of about 0.005 K and specific humidity interval of about 0.05 g kg^{−1}). For each of these *α* values, we then add a perturbation of Δ*α* = 0.01 to the input variables of the OTC-AS cumulus parameterization scheme. Results of the nonlinear perturbation and tangent linear perturbation are shown in Figs. 6b and 6c, respectively. From Fig. 6a, we found that when *α* varies from 0 to 1 the convective process is activated at *α* = 0.43, indicating the occurrence of discontinuity of the forward model solution in the phase space of *α*. Another discontinuity occurs near *α* = 1.28 when the cumulus cloud top moved from the 12th to 13th model level. The results are qualitatively similar to those shown in Fig. 5. Outside and between discontinuous points, *R*_{1} varies in effect linearly, so that the tangent linear model represents perturbations as accurately as the nonlinear model (Figs. 6c and 6b, respectively). At discontinuity points, the tangent linear model is unable to represent the sharp jumps in *R*_{1}.

The above experiments were repeated by including all the columns (384 model grids) on two selected latitude belts, for example, a pair of Gaussian latitude circles symmetric with respect to the equator. The temperature and moisture profiles derived from the NCEP reanalysis on the pair of Gaussian latitude circles on 3.75°N(S) were used as **T**_{u} and **q**_{u} and those on 1.875°N(S) as **T**_{r} and **q**_{r} in (15). Numerical results of *R*_{1}, *R*_{2}, and *R*_{3} are shown in Fig. 7. The nonlinear model solutions (Fig. 7a) contain many discontinuous points. The results are, again, very similar to those shown in Figs. 5 and 6. The tangent linear model accurately represents the (quasi linear) variations of *R*_{1} between discontinuous points, but does not represent the jumps at those points. These results are consistent with the examination using idealized examples (Xu 1996a,b; 1997a,b).

### c. Features of the adjoint model solutions

**T**

^{obs},

**q**

^{obs}) are the output of the parameterized convection process operating on the reference state. The cost function and its gradient are evaluated for different

*α*values with an interval of Δ

*α*= 0.01. The corresponding numerical results are shown in Fig. 8 for both

*J*

_{AS}(dashed line) and ∇

*J*

_{AS}(solid line). Within the range of

*α*from 0 to 2,

*J*

_{AS}has two discontinuity points (as described in Fig. 6). The gradient, calculated by the adjoint model integrations, ∇

*J*

_{AS}, is also discontinuous at these two points. However, the slopes represented by the values of the adjoint-calculated gradient, for example, the thin solid lines crossing points P and Q represent the one-sided tangent directions of

*J*

_{AS}on both sides at a discontinuity point. Results in Figs. 8 and 9 are consistent with the theoretical results of the adjoint model integration obtained in section 2 using the chain rule method. They may also explain why the minimization with active discontinuous physics still performed reasonably well in the work of Zou and Kuo (1996), Fillion and Errico (1997), and Fillion and Mahfouf (2000).

Using the same data as Fig. 7 (case 3), we compute *J*_{AS} defined on 384 columns by (20) (thick-dashed line) and its gradient ∇_{α}*J*_{AS} (solid line) with different values of *α* using the interval Δ*α* = 0.001 to generate Fig. 9. In this case, the “observations” (**T**^{obs} and **q**^{obs}) are the temperature and moisture profiles after the convective adjustment on 384 columns on the latitudes 1.875°N(S). Similar to Fig. 7a, *J*_{AS} contains many discontinuity points. Between two neighboring discontinuity points, however, *J*_{AS} is continuous and differentiable. Therefore *J*_{AS} is piecewise differentiable on the whole *α* domain. When the number of the discontinuity points increases, it becomes difficult to identify these differentiable pieces, and the cost function may appear to be oscillating with respect to a control parameter (see Figs. 1c,d). Compared to the variation of *J*_{AS}, the distribution of gradient derived from the adjoint integration is rather smooth, with only one stationary point. Obviously, the adjoint model integration of a discontinuous physical parameterization still provides either a tangential slope at a continuity point (point A, for instance) or a one-sided tangential slope at a discontinuity point (point B or C, for instance). It is also remarkable to notice that the variation of the gradient with respect to *α* is in fact linear, which means that the variations of the cost function, in between its obvious discontinuous points, is quadratic, a nice feature that will make a minimization work well.

## 4. Examining the gradients calculated using the adjoint of the NCEP diabatic global spectral model

### a. Description of discontinuous physical processes and selection of control parameters

The NCEP global spectral model consists of a set of prognostic equations, including the tendency equations of vorticity, divergence, virtual temperature, specific humidity, and the surface pressure (Sela 1982, 1987). The model physics include large-scale precipitation, cumulus convection, shallow convection, vertical diffusion, gravity wave drag, and surface and boundary processes. The large-scale (grid scale) condensation consists of condensation in supersaturated layers and evaporation in unsaturated layers. It adjusts both the temperature and specific humidity. The cumulus parameterization scheme is that described in section 3. The shallow convection is the Betts scheme (Betts 1986). A nonlocal vertical diffusion scheme is adapted (Hong and Pan 1996). The gravity wave drag parameterization is the Pierrehumbert (1986) scheme. The surface processes include a two-layer soil model. In the related processes the Monin–Obukhov similarity theory is applied to simulate the change of the surface (air–ground interface) temperature and humidity, and to calculate exchange coefficients and turbulence fluxes in the planetary boundary layer. These physical parameterization schemes are included in both the nonlinear and the adjoint models. Longwave and shortwave radiation are kept constant during the 6-h assimilation window.

The model dynamical core includes horizontal diffusion and time filtering in which the filter coefficient (*α*_{A}) is taken as 1 − 2*ϵ*, where *ϵ* is the classical Asselin filter coefficient. The model carries out the time filter and horizontal diffusion at each time step integration. Although both the time filter coefficient *α*_{A} and the horizontal diffusion coefficient *α*_{HD} are not explicitly related to the model physical processes, any change in the values of these coefficients can affect model state and thus on–off switches in the parameterization schemes during the time integration. Therefore, these coefficients (*α*_{A} or *α*_{HD}) play equivalent roles to *α* in (15), where *α* controls the magnitude of perturbation of a multidimensional control variable such as initial condition. Then, any cost function (*J*) defined by the model solution and its gradient (∇*J*) can be evaluated using a series of *α*_{A} or *α*_{HD} values rather than perturbing initial conditions so that the features of the *J* and ∇*J* can be illustrated in *α*_{A} or *α*_{HD} space. Therefore, this study selects *α*_{A} and *α*_{HD} as the control variables to represent some numerical features of discontinuity in 4DVAR.

### b. Examinations of the calculated gradient through the NCEP diabatic adjoint

*l*

_{2}Euclidean norm, of two vectors. Here

**represents the control variable vector and we choose**

*α***= (**

*α**α*

_{A},

*α*

_{HD})

^{T}in this case. The

**x**

^{analysis}is the analysis state at 0600 UTC on 1 November 1995, representing observations

**y**

_{obs}in (1), and

**x**

^{6-hforecast}is the 6-h model forecast state [

**x**

_{0})] starting from the initial state (

**x**

_{0}) at 0000 UTC on 1 November 1995. The model resolution is 62 waves (not including the zonal mean) in horizontal domain and 28 vertical levels. The state vector

**x**consists of divergence (

**D**), vorticity (

*ζ*), virtual temperature (

**T**), specific humidity (

**q**), and the surface pressure (

**P**

_{s}), that is,

**x**= (

**D**,

*ζ*,

**T**,

**q**,

**P**

_{s})

^{T}. Here the observational operator H is a unity matrix in this case. Finally, 𝗪 is a diagonal weighting matrix approximated by the inverse of the absolute value of maximal 6-h differences of state variables. In this case, the diagonal values of 𝗪 are 1.2 × 10

^{−6}s

^{−1}for divergence, 2.4 × 10

^{−6}s

^{−1}for vorticity, 1 K for temperature, and 10 mb for surface pressure.

Variations of the cost function *J*_{model} defined by (21) and its gradient evaluated using the adjoint of the NCEP adiabatic (top panel) and diabatic (bottom panel) nonlinear forward models with respect to *α*_{A} (left) and *α*_{HD} (right) are shown in Figs. 1 and 10, respectively. The standard values of *α*_{A} and *α*_{HD} in the NCEP operational model are 0.92 and 0.3 × 10^{16}, respectively. Values of both *J*_{model} and ∇*J*_{model} are computed at an interval of 0.001 for *α*_{A} and 0.001 × 10^{16} for *α*_{HD}. Comparing Figs. 1c,d with Figs. 10c,d, we found that, although the diabatic *J*_{model} appears jagged with respect to both coefficients due to the jumps in nonlinear model solution, the variation of the calculated gradients using the diabatic adjoint is not following the jagged aspect, but is consistent with the general convexity of *J*_{model}. There is only one point in the phase space at which the gradient is zero, a necessary condition for a minimum. These characteristics of the calculated gradient, on the one hand, confirm that an adjoint integration evaluates the one-sided gradient at a discontinuity point as correctly as it does the gradient at a differentiable point. In fact, *J*_{model} in Figs. 1c,d can be viewed as an extension of the *J*_{AS} in Fig. 9, which is piecewise differentiable with many more discontinuities when all the discontinuous parameterization schemes are used in a global domain within a 6-h assimilation window (a total of 12 time steps using 30 min as the time integration step size). When the one-dimensional input space contains too many discontinuities, the cost function curve may appear “oscillating” with a finite interval of *α*_{A} (*α*_{HD}) while the corresponding gradient curve appears relatively smooth. The phenomenen may be case dependent because there is no guarantee that the cost function, which is produced by the diabatic model, is only with zero-order discontinuity. In addition, we notice that the variations of the gradient are more linear (and the cost function must be more quadratic) in the diabatic case than in the adiabatic case, implying that the Hessian matrix is nearly constant in the diabatic case. Further research is of course needed to examine the nature of a diabatic cost function and clarify these issues, especially when the model resolution is increased.

## 5. Correctness check of the TLM and adjoint over multiphysical regimes in the presence of discontinuities

_{l}for TLM test and Φ

_{g}for gradient test) is defined to measure the discrepancy between the nonlinear perturbation and its first-order approximation as

*l*

_{2}Euclidean norm,

**) the tangent linear operator of**

*α**J*the defined cost function such as in (20) or (21), ∇

_{α}

*J*the gradient derived from an adjoint integration, and

*β*is a scalar controlling the size of input perturbation. For a differentiable

*J*, once the basic state

**(usually an initial analysis) and the perturbation**

*α**δ*

**(usually taken as ∇**

*α**J*/‖∇

*J*‖) are chosen, the ratio is expected to approach 1 linearly, with increasing accuracy as

*β*becomes progressively smaller. Assuming Φ

_{l}(

*β*) = 1 +

*Aβ*and

*β*= 10

^{−i}, the function log(Φ

_{l}− 1) (=log

*A*−

*i*) is a linear function of

*i*.

In section 2, we proved that the adjoint integration provides the one-sided gradient for the cost function defined by the parameterized physics at a discontinuity point. In order to examine the conclusion numerically, we use the gradient of *J*_{AS} [see (20)] evaluated by the adjoint of the OTC-AS parameterization column model at either side of a discontinuity point (points P and Q in Fig. 8) to conduct the gradient test. Figure 11 presents the variation of log(|Φ_{g} − 1|) with respect to log|*β*| for a left (−*β*) and right (+*β*) perturbation [the basic perturbation is taken as Δ**T** and Δ**q**, see (15) for case 2]. It is found that, at the discontinuous point, the gradients computed by the adjoint of the OTC-AS parameterization are correct from either side, although the two gradients (at points P and Q in Fig. 8) are different.

Considering that the on–off switches in model parameterizations may produce lots of discontinuities in practice, and one does not always know where these discontinuities are, a test that conducts the computation of Φ_{g} over a range of basic states may provide more information to verify the usefulness of an adjoint. With the same technique used in Fig. 7, we can check the feature of the adjoint solution with respect to a set of basic states defined by the different *α* values. Figure 12 exhibits the features of the nonlinear change (the numerator of Φ_{g} expression; Fig. 12a) and the first-order approximation (the denominator of Φ_{g} expression; Fig. 12b) of a perturbed cost function using the OTC-AS parameterization and its adjoint, and their ratio (Φ_{g}; Fig. 12c). The cost function is defined by (20) and the basic state by (15) where the states on the latitudes of 1.875°N(S) and 3.75°N(S) are used as **T**_{r}, **q**_{r} and **T**_{u}, **q**_{u} through a 0.001 interval of *α*.

Figure 12 shows that the ratio of nonlinear to linear perturbations derived from the adjoint (Figs. 12a and 12b, respectively) leads to conclusions similar to those obtained from Figs. 5–7. Between discontinuous points, the ratio is close to 1, meaning that the linear approximation derived from the adjoint accurately describes the nonlinear perturbations of the cost function. This is true except in the vicinity of the point *α* = 1, where the gradient is equal to 0 and the ratio becomes infinite. And, at discontinuous points, the linear approximation does not capture the sharp variations of the cost function *J*_{AS}.

## 6. Summary and conclusions

A diabatic forecast model including parameterized physics simulates the evolution of the atmosphere better than an adiabatic one does. Efforts have been made to incorporate parameterized physical processes into 4DVAR experiments to include satellite and radar measurements of rain rates and cloud properties and thus to enhance forecast skills. However, physical parameterizations include on–off switches that come from regime changes in the physical processes of the real atmosphere due to the numerical discretization. Many of them are hard to remove. Examples include the triggering of moist processes and different regimes in the surface processes. These on–off switches make the nonlinear model solution and the cost function defined in 4DVAR discontinuous.

By examining the numerical results from the tangent linear and adjoint operators of a simple idealized model, a simplified Arakawa–Schubert cumulus parameterization scheme, and the NCEP diabatic model, along with an analytical explanation for these calculated results, we illustrated that (i) the tangent linear approximation of a diabatic model is valid only in the differentiable interior of cells bounded in input spaces by discontinuity surface and that it could not represent the nonlinear perturbation growth when a perturbation crosses a discontinuity; (ii) the gradient calculated using the adjoint of a parameterization or a diabatic model correctly reflects a local tangential (or one-sided tangential) slope of the cost function, since the adjoint model integration represents a numerical implementation of the chain rule, using a complex NWP model as a compound transformation; and (iii) the test of the tangent linear and adjoint codes only using a basic state may be inadequate to ensure the correctness of codes in all physical regimes when the model is discontinuous and other checks based on different basic states may be recommended.

The calculated gradient using an adjoint model mainly serves as a descent direction in minimization (Navon and Legler 1987; Liu and Nocedal 1989; Zou et al. 1993a). A commonly used minimization algorithm, such as the limited-memory quasi-Newton algorithm, utilizes this information about local tangential slopes to define a search direction at each iteration. Therefore, the cost function in minimization always can be viewed as a smoothed version using a set of gradients. This smoothing process depends on the step size used in the line search and produces, somehow, a general convexity of the zigzag jumping cost function curve. The general convexity may mainly be governed by the adiabatic dynamics of the model. Therefore, the subgrid small-scale processes described by the parameterized physics may not hurt the general convexity of the cost function, although they make the cost function jagged around. This explains, to some degree, why the minimization algorithm that was developed for minimizing differentiable cost functions may still perform well for most 4DVAR problems using a diabatic assimilation model. Examples include assimilation experiments using precipitation (Zou and Kuo 1996) and precipitable water (Kuo et al. 1996) using a diabatic mesoscale model and its adjoint, and 1DVAR minimization using the relaxed Arakawa–Schubert scheme (Fillion and Mahfouf 2000). However, under certain specific situations, the limited-memory quasi-Newton algorithm may fail to minimize a piecewise differentiable cost function (Zhang et al. 2000). In the case, a nonsmooth optimization algorithm, such as the bundle method (Lemaréchal 1978), may improve the minimization (Zhang et al. 2000). Even a nonsmooth optimization algorithm using subgradient is still not free of problems (Zhang et al. 2000). In such a situation, further test and thought on generalized adjoint (Xu 1996a,b; 1997a,b; Xu et al. 1998; Xu and Gao 1999) may be one of potentials to solve the problem.

This study only discussed the case of the zeroth-order discontinuities in which the cost function under consideration is itself discontinuous. Higher-order discontinuities of situations when the cost function is continuous, but one of its derivatives is discontinuous, may also be present in numerical models of the atmospheric flow. The investigation of the impact of higher-order discontinuities on variational data assimilation, especially for meso- and convection-scale analysis, requires more research work to be done.

This study only examined some of the numerical results from tangent linear and adjoint models of either simple models or a coarse-resolution global prediction model. The investigation of the behavior of the cost function and its minimization with discontinuities for meso- and convection-scale analysis is required and planned for future work.

## Acknowledgments

The research is supported by NOAA Grant NA77WA0571 and NSF Grant ATM-9812729. The authors would like to thank Drs. E. Kalnay and J. Sela for their persistent support and encouragement, and Dr. Olivier Talagrand and Dr. Qin Xu for their suggestions that were useful for improving the original manuscript. Thanks go to two anonymous reviewers for thorough and helpful comments and suggestions.

## REFERENCES

Arakawa, A., and W. H. Schubert, 1974: Interaction of a cumulus ensemble with the large-scale environment. Part I.

,*J. Atmos. Sci***31****,**674–701.Betts, A. K., 1986: A new convective adjustment scheme. Part I: Observational and theoretical basis.

,*Quart. J. Roy. Meteor. Soc***112****,**677–691.Courtier, P., and O. Talagrand, 1987: Variational assimilation of meteorological observations with the adjoint equation—Part I. Numerical results.

,*Quart. J. Roy. Meteor. Soc***113****,**1329–1347.Errico, R. M., 1997: What is an adjoint model?

,*Bull. Amer. Meteor. Soc***78****,**2577–2591.Fillion, L., and R. Errico, 1997: Variational assimilation of precipitation data using moist convective parameterization schemes: A 1D-Var study.

,*Mon. Wea. Rev***125****,**2917–2942.Fillion, L., and J-F. Mahfouf, 2000: Coupling of moist-convective and stratiform precipitation processes for variational data assimilation.

,*Mon. Wea. Rev***128****,**109–124.Hong, S. Y., and H. L. Pan, 1996: Nonlocal boundary-layer vertical diffusion in a medium-range model.

,*Mon. Wea. Rev***124****,**2322–2339.Kuo, Y-H., X. Zou, and Y-R. Guo, 1996: Variational assimilation of precipitable water using a nonhydrostatic mesoscale adjoint model. Part I: Moisture retrieval and sensitivity experiments.

,*Mon. Wea. Rev***124****,**122–147.Le Dimet, F. X., and O. Talagrand, 1986: Variational algorithms for analysis and assimilation of meteorological observations: Theoretical aspects.

,*Tellus***38A****,**97–110.Lemaréchal, C., 1978: Nonsmooth optimization and descent methods. International Institute for Applied System Analysis, Laxenburg, Austria, 25 pp.

Lemaréchal, C., and C. Sagastizabal, 1997: Variable metric bundle methods: From conceptual to implemetable forms.

,*Math. Programm***76****,**393–410.Liu, D. C., and J. Nocedal, 1989: On the limited memory BFGS method for large scale optimization.

,*Math. Programm***45****,**503–528.Mahfouf, J-F., and F. Rabier, 2000: The ECMWF operational implementation of four-dimensional variational assimilation Part II: Experimental results with improved physics.

,*Quart. J. Roy. Meteor. Soc***126****,**1171–1190.Pan, H. L., and W. S. Wu, 1995: Implementing a mass flux convection parameterization package for the NMC medium-range forecast model. NMC/NOAA/NWS Tech. Rep., 409, 40 pp.

Pierrehumbert, R. T., 1986: An essay on the parameterization of orographic gravity wave drag. [Available from GFDI/NOAA, Princeton University, Princeton, NJ 08542.].

Sela, J., 1982: The NMC spectral model. NOAA Tech. Rep. NWS 30, 36 pp.

Sela, J., 1987: The new T80 NMC operational spectral model. Preprints,

*Eighth Conf. Numerical Weather Prediction*, Baltimore, MD, Amer. Meteor. Soc., 312–313.Xiao, Q., X. Zou, and Y-H. Kuo, 2000: Incorporating the SSM/I-derived precipitable water and rainfall rate into a numerical model: A case study for the ERICA IOP-4 cyclone.

,*Mon. Wea. Rev***128****,**87–108.Xu, Q., 1996a: Generalized adjoint for physical processes with parameterized discontinuities. Part I: Basic issues and heuristic examples.

,*J. Atmos. Sci***53****,**1123–1142.Xu, Q., 1996b: Generalized adjoint for physical processes with parameterized discontinuities. Part II: Vector formulations and matching conditions.

,*J. Atmos. Sci***53****,**1143–1155.Xu, Q., 1997a: Generalized adjoint for physical processes with parameterized discontinuities. Part III: Multiple threshold conditions.

,*J. Atmos. Sci***54****,**2713–2721.Xu, Q., 1997b: Generalized adjoint for physical processes with parameterized discontinuities. Part IV: Problems in time discretization.

,*J. Atmos. Sci***54****,**2722–2728.Xu, Q., and J. Gao, 1999: Generalized adjoint for physical processes with parameterized discontinuities. Part VI: Minimization problems in multidimensional space.

,*J. Atmos. Sci***56****,**994–1002.Xu, Q., J. Gao, and W. Gu, 1998: Generalized adjoint for physical processes with parameterized discontinuities. Part V: Coarse-grain adjoint and problems in gradient check.

,*J. Atmos. Sci***55****,**2130–2135.Zhang, S., 2000: Use of adjoint physics in 4D VAR with the NCEP Global Spectral Model. Ph.D. dissertation, The Florida State University, 185 pp.

Zhang, S., X. Zou, J. Ahlquist, I. M. Navon, and J. G. Sela, 2000: Use of differentiable and nondifferentiable optimization algorithms for variational data assimilation with discontinuous cost functions.

,*Mon. Wea. Rev***128****,**4031–4044.Zhu, Y., and I. M. Navon, 1999: Impact of key parameters estimation on the performance of the FSU spectral model using the full physics adjoint.

,*Mon. Wea. Rev***127****,**1497–1517.Zou, X., 1997: Tangent linear and adjoint of “on-off” processes and their feasibility for use in 4-dimensional variational data assimilation.

,*Tellus***49A****,**3–31.Zou, X., and Y-H. Kuo, 1996: Rainfall assimilation through an optimal control of initial and boundary conditions in a limited-area mesoscale model.

,*Mon. Wea. Rev***124****,**2859–2882.Zou, X., I. M. Navon, M. Berger, P. K. H. Phua, T. Schlick, and F. X. LeDimet, 1993a: Numerical experience with limited-memory quasi-Newton and truncated-Newton methods.

,*SIAM J. Optimization***3****,**582–608.Zou, X., I. M. Navon, and J. G. Sela, 1993b: Variational data assimilation with moist threshold processes using the NMC spectral model.

,*Tellus***45A****,**370–387.Zupanski, D., 1993: The effects of discontinuities in the Betts–Miller cumulus convection scheme on four-dimensional variational data assimilation in a quasi-operational forecasting environment.

,*Tellus***45A****,**511–524.

Variation of the cost function *J*_{n} of a single-variable discontinuous model [see (10) and (11)]. The number “*n*” represents the total time steps in the assimilation window

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Variation of the cost function *J*_{n} of a single-variable discontinuous model [see (10) and (11)]. The number “*n*” represents the total time steps in the assimilation window

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Variation of the cost function *J*_{n} of a single-variable discontinuous model [see (10) and (11)]. The number “*n*” represents the total time steps in the assimilation window

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Same as Fig. 2 except for gradient

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Same as Fig. 2 except for gradient

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Same as Fig. 2 except for gradient

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

A schematic diagram of the OTC-AS cumulus parameterization scheme

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

A schematic diagram of the OTC-AS cumulus parameterization scheme

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

A schematic diagram of the OTC-AS cumulus parameterization scheme

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Variation of (a) the temperature adjustment (*T*^{′}_{2}*T* = 0.005 K added to the input temperature on the same level, (b) The nonlinear perturbation solution, and (c) the tangent linear approximation introduced by an input perturbation of temperature 0.005 K at the second model level

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Variation of (a) the temperature adjustment (*T*^{′}_{2}*T* = 0.005 K added to the input temperature on the same level, (b) The nonlinear perturbation solution, and (c) the tangent linear approximation introduced by an input perturbation of temperature 0.005 K at the second model level

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Variation of (a) the temperature adjustment (*T*^{′}_{2}*T* = 0.005 K added to the input temperature on the same level, (b) The nonlinear perturbation solution, and (c) the tangent linear approximation introduced by an input perturbation of temperature 0.005 K at the second model level

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Variation of the *l*_{2} Euclidean norm of (a) the adjusted temperature and specific humidity [*R*_{1}, see (16)] of the OTC-AS parameterization at a column around 1.875°N, 87°E calculated at an interval of Δ*α* = 0.01, (b) the nonlinear perturbation solution produced by an input perturbation of Δ*α* = 0.01 [*R*_{2}, see (17)], and (c) the corresponding tangent linear approximation to the nonlinear perturbation [*R*_{3}, see (18)]

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Variation of the *l*_{2} Euclidean norm of (a) the adjusted temperature and specific humidity [*R*_{1}, see (16)] of the OTC-AS parameterization at a column around 1.875°N, 87°E calculated at an interval of Δ*α* = 0.01, (b) the nonlinear perturbation solution produced by an input perturbation of Δ*α* = 0.01 [*R*_{2}, see (17)], and (c) the corresponding tangent linear approximation to the nonlinear perturbation [*R*_{3}, see (18)]

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Variation of the *l*_{2} Euclidean norm of (a) the adjusted temperature and specific humidity [*R*_{1}, see (16)] of the OTC-AS parameterization at a column around 1.875°N, 87°E calculated at an interval of Δ*α* = 0.01, (b) the nonlinear perturbation solution produced by an input perturbation of Δ*α* = 0.01 [*R*_{2}, see (17)], and (c) the corresponding tangent linear approximation to the nonlinear perturbation [*R*_{3}, see (18)]

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Same as Fig. 6 except for all the 384 columns around latitude 3.75°N(S) and Δ*α* = 0.001

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Same as Fig. 6 except for all the 384 columns around latitude 3.75°N(S) and Δ*α* = 0.001

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Same as Fig. 6 except for all the 384 columns around latitude 3.75°N(S) and Δ*α* = 0.001

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Variation of the cost function (*J*_{AS}) defined by (20) (thick dashed line), and the gradient (∇_{α}*J*_{AS}) (thick solid line) calculated from the adjoint of OTC-AS parameterization. A straight line crossing point *P* represents a local tangential slope derived from the adjoint. For graphing, the gradient is multiplied by a factor of 0.5. Both *J*_{AS} and ∇_{α}*J*_{AS} are calculated at an interval of *α* = 0.01

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Variation of the cost function (*J*_{AS}) defined by (20) (thick dashed line), and the gradient (∇_{α}*J*_{AS}) (thick solid line) calculated from the adjoint of OTC-AS parameterization. A straight line crossing point *P* represents a local tangential slope derived from the adjoint. For graphing, the gradient is multiplied by a factor of 0.5. Both *J*_{AS} and ∇_{α}*J*_{AS} are calculated at an interval of *α* = 0.01

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Variation of the cost function (*J*_{AS}) defined by (20) (thick dashed line), and the gradient (∇_{α}*J*_{AS}) (thick solid line) calculated from the adjoint of OTC-AS parameterization. A straight line crossing point *P* represents a local tangential slope derived from the adjoint. For graphing, the gradient is multiplied by a factor of 0.5. Both *J*_{AS} and ∇_{α}*J*_{AS} are calculated at an interval of *α* = 0.01

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Same as Fig. 8 except that the OTC-AS parameterization scheme is used on 384 columns around latitude 3.75°N(S) where the input state is varied through (15) using Δ*α* = 0.001; observations are defined as the model solution of the OTC-AS parameterization on those columns around latitude 1.875°N(S) and a factor of 10^{−2} used for graphing the gradient. A zero-gradient line (dotted line) is marked as a reference

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Same as Fig. 8 except that the OTC-AS parameterization scheme is used on 384 columns around latitude 3.75°N(S) where the input state is varied through (15) using Δ*α* = 0.001; observations are defined as the model solution of the OTC-AS parameterization on those columns around latitude 1.875°N(S) and a factor of 10^{−2} used for graphing the gradient. A zero-gradient line (dotted line) is marked as a reference

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Same as Fig. 8 except that the OTC-AS parameterization scheme is used on 384 columns around latitude 3.75°N(S) where the input state is varied through (15) using Δ*α* = 0.001; observations are defined as the model solution of the OTC-AS parameterization on those columns around latitude 1.875°N(S) and a factor of 10^{−2} used for graphing the gradient. A zero-gradient line (dotted line) is marked as a reference

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Same as Fig. 1 except for gradient. A zero-gradient line is marked as a reference (dotted line) for each panel

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Same as Fig. 1 except for gradient. A zero-gradient line is marked as a reference (dotted line) for each panel

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Same as Fig. 1 except for gradient. A zero-gradient line is marked as a reference (dotted line) for each panel

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

The variation of log_{10}(|Φ_{g} − 1|) for the OTC-AS parameterization column model with −*β* (testing the left derivative) and +*β* (testing the right derivative) at a discontinuity point (point P in Fig. 8). The definition of the cost function and data used are same as Fig. 8. The basic perturbation is taken as Δ**T** and Δ**q** [see Eq. (15)]

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

The variation of log_{10}(|Φ_{g} − 1|) for the OTC-AS parameterization column model with −*β* (testing the left derivative) and +*β* (testing the right derivative) at a discontinuity point (point P in Fig. 8). The definition of the cost function and data used are same as Fig. 8. The basic perturbation is taken as Δ**T** and Δ**q** [see Eq. (15)]

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

The variation of log_{10}(|Φ_{g} − 1|) for the OTC-AS parameterization column model with −*β* (testing the left derivative) and +*β* (testing the right derivative) at a discontinuity point (point P in Fig. 8). The definition of the cost function and data used are same as Fig. 8. The basic perturbation is taken as Δ**T** and Δ**q** [see Eq. (15)]

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Variation of (a) the nonlinear change of the perturbed cost function (*J*_{AS}) produced by Δ*α* = 0.001 [see (15)], (b) the leading order (linear) approximation derived from the adjoint of the perturbation of *J*_{AS}, and (c) the ratio of the nonlinear change and the linear approximation, with *α*

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Variation of (a) the nonlinear change of the perturbed cost function (*J*_{AS}) produced by Δ*α* = 0.001 [see (15)], (b) the leading order (linear) approximation derived from the adjoint of the perturbation of *J*_{AS}, and (c) the ratio of the nonlinear change and the linear approximation, with *α*

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Variation of (a) the nonlinear change of the perturbed cost function (*J*_{AS}) produced by Δ*α* = 0.001 [see (15)], (b) the leading order (linear) approximation derived from the adjoint of the perturbation of *J*_{AS}, and (c) the ratio of the nonlinear change and the linear approximation, with *α*

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2