• Arakawa, A., and W. H. Schubert, 1974: Interaction of a cumulus ensemble with the large-scale environment. Part I. J. Atmos. Sci, 31 , 674701.

    • Search Google Scholar
    • Export Citation
  • Betts, A. K., 1986: A new convective adjustment scheme. Part I: Observational and theoretical basis. Quart. J. Roy. Meteor. Soc, 112 , 677691.

    • Search Google Scholar
    • Export Citation
  • Courtier, P., and O. Talagrand, 1987: Variational assimilation of meteorological observations with the adjoint equation—Part I. Numerical results. Quart. J. Roy. Meteor. Soc, 113 , 13291347.

    • Search Google Scholar
    • Export Citation
  • Errico, R. M., 1997: What is an adjoint model? Bull. Amer. Meteor. Soc, 78 , 25772591.

  • Fillion, L., and R. Errico, 1997: Variational assimilation of precipitation data using moist convective parameterization schemes: A 1D-Var study. Mon. Wea. Rev, 125 , 29172942.

    • Search Google Scholar
    • Export Citation
  • Fillion, L., and J-F. Mahfouf, 2000: Coupling of moist-convective and stratiform precipitation processes for variational data assimilation. Mon. Wea. Rev, 128 , 109124.

    • Search Google Scholar
    • Export Citation
  • Hong, S. Y., and H. L. Pan, 1996: Nonlocal boundary-layer vertical diffusion in a medium-range model. Mon. Wea. Rev, 124 , 23222339.

  • Kuo, Y-H., X. Zou, and Y-R. Guo, 1996: Variational assimilation of precipitable water using a nonhydrostatic mesoscale adjoint model. Part I: Moisture retrieval and sensitivity experiments. Mon. Wea. Rev, 124 , 122147.

    • Search Google Scholar
    • Export Citation
  • Le Dimet, F. X., and O. Talagrand, 1986: Variational algorithms for analysis and assimilation of meteorological observations: Theoretical aspects. Tellus, 38A , 97110.

    • Search Google Scholar
    • Export Citation
  • Lemaréchal, C., 1978: Nonsmooth optimization and descent methods. International Institute for Applied System Analysis, Laxenburg, Austria, 25 pp.

    • Search Google Scholar
    • Export Citation
  • Lemaréchal, C., and C. Sagastizabal, 1997: Variable metric bundle methods: From conceptual to implemetable forms. Math. Programm, 76 , 393410.

    • Search Google Scholar
    • Export Citation
  • Liu, D. C., and J. Nocedal, 1989: On the limited memory BFGS method for large scale optimization. Math. Programm, 45 , 503528.

  • Mahfouf, J-F., and F. Rabier, 2000: The ECMWF operational implementation of four-dimensional variational assimilation Part II: Experimental results with improved physics. Quart. J. Roy. Meteor. Soc, 126 , 11711190.

    • Search Google Scholar
    • Export Citation
  • Navon, I. M., and D. M. Legler, 1987: Conjugate-gradient methods for large-scale minimization in meteorology. Mon. Wea. Rev, 115 , 14791502.

    • Search Google Scholar
    • Export Citation
  • Navon, I. M., X. Zou, J. Derber, and J. Sela, 1992: Variational data assimilation with an adiabatic version of the NMC spectral model. Mon. Wea. Rev, 120 , 14331446.

    • Search Google Scholar
    • Export Citation
  • Pan, H. L., and W. S. Wu, 1995: Implementing a mass flux convection parameterization package for the NMC medium-range forecast model. NMC/NOAA/NWS Tech. Rep., 409, 40 pp.

    • Search Google Scholar
    • Export Citation
  • Pierrehumbert, R. T., 1986: An essay on the parameterization of orographic gravity wave drag. [Available from GFDI/NOAA, Princeton University, Princeton, NJ 08542.].

    • Search Google Scholar
    • Export Citation
  • Sela, J., 1982: The NMC spectral model. NOAA Tech. Rep. NWS 30, 36 pp.

  • Sela, J., 1987: The new T80 NMC operational spectral model. Preprints, Eighth Conf. Numerical Weather Prediction, Baltimore, MD, Amer. Meteor. Soc., 312–313.

    • Search Google Scholar
    • Export Citation
  • Xiao, Q., X. Zou, and Y-H. Kuo, 2000: Incorporating the SSM/I-derived precipitable water and rainfall rate into a numerical model: A case study for the ERICA IOP-4 cyclone. Mon. Wea. Rev, 128 , 87108.

    • Search Google Scholar
    • Export Citation
  • Xu, Q., 1996a: Generalized adjoint for physical processes with parameterized discontinuities. Part I: Basic issues and heuristic examples. J. Atmos. Sci, 53 , 11231142.

    • Search Google Scholar
    • Export Citation
  • Xu, Q., 1996b: Generalized adjoint for physical processes with parameterized discontinuities. Part II: Vector formulations and matching conditions. J. Atmos. Sci, 53 , 11431155.

    • Search Google Scholar
    • Export Citation
  • Xu, Q., 1997a: Generalized adjoint for physical processes with parameterized discontinuities. Part III: Multiple threshold conditions. J. Atmos. Sci, 54 , 27132721.

    • Search Google Scholar
    • Export Citation
  • Xu, Q., 1997b: Generalized adjoint for physical processes with parameterized discontinuities. Part IV: Problems in time discretization. J. Atmos. Sci, 54 , 27222728.

    • Search Google Scholar
    • Export Citation
  • Xu, Q., and J. Gao, 1999: Generalized adjoint for physical processes with parameterized discontinuities. Part VI: Minimization problems in multidimensional space. J. Atmos. Sci, 56 , 9941002.

    • Search Google Scholar
    • Export Citation
  • Xu, Q., J. Gao, and W. Gu, 1998: Generalized adjoint for physical processes with parameterized discontinuities. Part V: Coarse-grain adjoint and problems in gradient check. J. Atmos. Sci, 55 , 21302135.

    • Search Google Scholar
    • Export Citation
  • Zhang, S., 2000: Use of adjoint physics in 4D VAR with the NCEP Global Spectral Model. Ph.D. dissertation, The Florida State University, 185 pp.

    • Search Google Scholar
    • Export Citation
  • Zhang, S., X. Zou, J. Ahlquist, I. M. Navon, and J. G. Sela, 2000: Use of differentiable and nondifferentiable optimization algorithms for variational data assimilation with discontinuous cost functions. Mon. Wea. Rev, 128 , 40314044.

    • Search Google Scholar
    • Export Citation
  • Zhu, Y., and I. M. Navon, 1999: Impact of key parameters estimation on the performance of the FSU spectral model using the full physics adjoint. Mon. Wea. Rev, 127 , 14971517.

    • Search Google Scholar
    • Export Citation
  • Zou, X., 1997: Tangent linear and adjoint of “on-off” processes and their feasibility for use in 4-dimensional variational data assimilation. Tellus, 49A , 331.

    • Search Google Scholar
    • Export Citation
  • Zou, X., and Y-H. Kuo, 1996: Rainfall assimilation through an optimal control of initial and boundary conditions in a limited-area mesoscale model. Mon. Wea. Rev, 124 , 28592882.

    • Search Google Scholar
    • Export Citation
  • Zou, X., I. M. Navon, M. Berger, P. K. H. Phua, T. Schlick, and F. X. LeDimet, 1993a: Numerical experience with limited-memory quasi-Newton and truncated-Newton methods. SIAM J. Optimization, 3 , 582608.

    • Search Google Scholar
    • Export Citation
  • Zou, X., I. M. Navon, and J. G. Sela, 1993b: Variational data assimilation with moist threshold processes using the NMC spectral model. Tellus, 45A , 370387.

    • Search Google Scholar
    • Export Citation
  • Zupanski, D., 1993: The effects of discontinuities in the Betts–Miller cumulus convection scheme on four-dimensional variational data assimilation in a quasi-operational forecasting environment. Tellus, 45A , 511524.

    • Search Google Scholar
    • Export Citation
  • View in gallery
    Fig. 1.

    Variation of the cost function Jmodel defined as a weighted sum of the 6-h squared forecast errors [see (21)] using the NCEP adiabatic (top) and diabatic (bottom) global spectral models with a time filter coefficient αA (left) and a horizontal diffusion coefficient αHD (right) (see section 4). The cost function is evaluated through integrating the nonlinear models using 0.001 and 0.001 × 1016 as the intervals of αA and αHD

  • View in gallery
    Fig. 2.

    Variation of the cost function Jn of a single-variable discontinuous model [see (10) and (11)]. The number “n” represents the total time steps in the assimilation window

  • View in gallery
    Fig. 3.

    Same as Fig. 2 except for gradient

  • View in gallery
    Fig. 4.

    A schematic diagram of the OTC-AS cumulus parameterization scheme

  • View in gallery
    Fig. 5.

    Variation of (a) the temperature adjustment (T2) on the second model level using the OTC-AS parameterization scheme due to a perturbation of ΔT = 0.005 K added to the input temperature on the same level, (b) The nonlinear perturbation solution, and (c) the tangent linear approximation introduced by an input perturbation of temperature 0.005 K at the second model level

  • View in gallery
    Fig. 6.

    Variation of the l2 Euclidean norm of (a) the adjusted temperature and specific humidity [R1, see (16)] of the OTC-AS parameterization at a column around 1.875°N, 87°E calculated at an interval of Δα = 0.01, (b) the nonlinear perturbation solution produced by an input perturbation of Δα = 0.01 [R2, see (17)], and (c) the corresponding tangent linear approximation to the nonlinear perturbation [R3, see (18)]

  • View in gallery
    Fig. 7.

    Same as Fig. 6 except for all the 384 columns around latitude 3.75°N(S) and Δα = 0.001

  • View in gallery
    Fig. 8.

    Variation of the cost function (JAS) defined by (20) (thick dashed line), and the gradient (∇αJAS) (thick solid line) calculated from the adjoint of OTC-AS parameterization. A straight line crossing point P represents a local tangential slope derived from the adjoint. For graphing, the gradient is multiplied by a factor of 0.5. Both JAS and ∇αJAS are calculated at an interval of α = 0.01

  • View in gallery
    Fig. 9.

    Same as Fig. 8 except that the OTC-AS parameterization scheme is used on 384 columns around latitude 3.75°N(S) where the input state is varied through (15) using Δα = 0.001; observations are defined as the model solution of the OTC-AS parameterization on those columns around latitude 1.875°N(S) and a factor of 10−2 used for graphing the gradient. A zero-gradient line (dotted line) is marked as a reference

  • View in gallery
    Fig. 10.

    Same as Fig. 1 except for gradient. A zero-gradient line is marked as a reference (dotted line) for each panel

  • View in gallery
    Fig. 11.

    The variation of log10(|Φg − 1|) for the OTC-AS parameterization column model with −β (testing the left derivative) and +β (testing the right derivative) at a discontinuity point (point P in Fig. 8). The definition of the cost function and data used are same as Fig. 8. The basic perturbation is taken as ΔT and Δq [see Eq. (15)]

  • View in gallery
    Fig. 12.

    Variation of (a) the nonlinear change of the perturbed cost function (JAS) produced by Δα = 0.001 [see (15)], (b) the leading order (linear) approximation derived from the adjoint of the perturbation of JAS, and (c) the ratio of the nonlinear change and the linear approximation, with α

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 103 24 3
PDF Downloads 63 15 0

Examination of Numerical Results from Tangent Linear and Adjoint of Discontinuous Nonlinear Models

S. ZhangGeophysical Fluid Dynamics Laboratory, Princeton University, Princeton, New Jersey

Search for other papers by S. Zhang in
Current site
Google Scholar
PubMed
Close
,
X. ZouMeteorology Department, The Florida State University, Tallahassee, Florida

Search for other papers by X. Zou in
Current site
Google Scholar
PubMed
Close
, and
Jon E. AhlquistMeteorology Department, The Florida State University, Tallahassee, Florida

Search for other papers by Jon E. Ahlquist in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

The forward model solution and its functional (e.g., the cost function in 4DVAR) are discontinuous with respect to the model's control variables if the model contains discontinuous physical processes that occur during the assimilation window. In such a case, the tangent linear model (the first-order approximation of a finite perturbation) is unable to represent the sharp jumps of the nonlinear model solution. Also, the first-order approximation provided by the adjoint model is unable to represent a finite perturbation of the cost function when the introduced perturbation in the control variables crosses discontinuous points. Using an idealized simple model and the Arakawa–Schubert cumulus parameterization scheme, the authors examined the behavior of a cost function and its gradient obtained by the adjoint model with discontinuous model physics. Numerical results show that a cost function involving discontinuous physical processes is zeroth-order discontinuous, but piecewise differentiable. The maximum possible number of involved discontinuity points of a cost function increases exponentially as 2kn, where k is the total number of thresholds associated with on–off switches, and n is the total number of time steps in the assimilation window. A backward adjoint model integration with the proper forcings added at various time steps, similar to the backward adjoint model integration that provides the gradient of the cost function at a continuous point, produces a one-sided gradient (called a subgradient and denoted as ∇sJ) at a discontinuous point. An accuracy check of the gradient shows that the adjoint-calculated gradient is computed exactly on either side of a discontinuous surface. While a cost function evaluated using a small interval in the control variable space oscillates, the distribution of the gradient calculated at the same resolution not only shows a rather smooth variation, but also is consistent with the general convexity of the original cost function. The gradients of discontinuous cost functions are observed roughly smooth since the adjoint integration correctly computes the one-sided gradient at either side of discontinuous surface. This implies that, although (∇sJ)Tδx may not approximate δJ = J(x + δx) − J(x) well near the discontinuous surface, the subgradient calculated by the adjoint of discontinuous physics may still provide useful information for finding the search directions in a minimization procedure. While not eliminating the possible need for the use of a nondifferentiable optimization algorithm for 4DVAR with discontinuous physics, consistency between the computed gradient by adjoints and the convexity of the cost function may explain why a differentiable limited-memory quasi-Newton algorithm still worked well in many 4DVAR experiments that use a diabatic assimilation model with discontinuous physics.

Corresponding author address: Dr. S. Zhang, GFDL/NOAA, Princeton University, P.O. Box 308, Princeton, NJ 08542. Email: snz@gfdl.noaa.gov

Abstract

The forward model solution and its functional (e.g., the cost function in 4DVAR) are discontinuous with respect to the model's control variables if the model contains discontinuous physical processes that occur during the assimilation window. In such a case, the tangent linear model (the first-order approximation of a finite perturbation) is unable to represent the sharp jumps of the nonlinear model solution. Also, the first-order approximation provided by the adjoint model is unable to represent a finite perturbation of the cost function when the introduced perturbation in the control variables crosses discontinuous points. Using an idealized simple model and the Arakawa–Schubert cumulus parameterization scheme, the authors examined the behavior of a cost function and its gradient obtained by the adjoint model with discontinuous model physics. Numerical results show that a cost function involving discontinuous physical processes is zeroth-order discontinuous, but piecewise differentiable. The maximum possible number of involved discontinuity points of a cost function increases exponentially as 2kn, where k is the total number of thresholds associated with on–off switches, and n is the total number of time steps in the assimilation window. A backward adjoint model integration with the proper forcings added at various time steps, similar to the backward adjoint model integration that provides the gradient of the cost function at a continuous point, produces a one-sided gradient (called a subgradient and denoted as ∇sJ) at a discontinuous point. An accuracy check of the gradient shows that the adjoint-calculated gradient is computed exactly on either side of a discontinuous surface. While a cost function evaluated using a small interval in the control variable space oscillates, the distribution of the gradient calculated at the same resolution not only shows a rather smooth variation, but also is consistent with the general convexity of the original cost function. The gradients of discontinuous cost functions are observed roughly smooth since the adjoint integration correctly computes the one-sided gradient at either side of discontinuous surface. This implies that, although (∇sJ)Tδx may not approximate δJ = J(x + δx) − J(x) well near the discontinuous surface, the subgradient calculated by the adjoint of discontinuous physics may still provide useful information for finding the search directions in a minimization procedure. While not eliminating the possible need for the use of a nondifferentiable optimization algorithm for 4DVAR with discontinuous physics, consistency between the computed gradient by adjoints and the convexity of the cost function may explain why a differentiable limited-memory quasi-Newton algorithm still worked well in many 4DVAR experiments that use a diabatic assimilation model with discontinuous physics.

Corresponding author address: Dr. S. Zhang, GFDL/NOAA, Princeton University, P.O. Box 308, Princeton, NJ 08542. Email: snz@gfdl.noaa.gov

1. Introduction

The goal of numerical weather prediction is to produce the most accurate forecasts possible. Once the equations are chosen and written in a form appropriate for computer solution, a global forecast depends on the initial conditions and model parameters. Four-dimensional variational data assimilation (4DVAR) chooses some or all of these input variables objectively. With 4DVAR, one begins by choosing a nonnegative, typically quadratic in terms of the model state, forecast error function (also called a cost function, usually chosen as a typically quadratic form of the model state). Then one applies variations of calculus to minimize the cost function by varying the input variables (also called control variables). The forecast model (called the assimilation model) serves as a strong constraint within the four-dimensional verification space of time and three-dimensional physical or Fourier-transformed space.

The cost function that is minimized in 4DVAR typically contains an observational term of the following form:
i1520-0493-129-11-2791-e1
This is a weighted sum of the squared differences between the forecast model solutions x(ti) = M(x0) and observations yobs in the time window between t0 and t0 + tR, where x0 includes the model state variable at the initial time t0 and model parameters, M is the operator representing the operations performed in the nonlinear model to obtain the model forecast at time t from an initial condition at time t0 (t > t0), yobs is a vector of observations, R is an observational error covariance matrix, and H is an observation operator that converts the model variables on model grids to the observation quantities and locations. When the assimilation model contains no discontinuous process, Jo is a continuously differentiable function of the control variables. Then a perturbation analysis of Jo yields
δJoJoTδx0
where ∇Jo is a column vector containing the gradient of the observational term of the the cost function with respect to the control variables and δx0 is a perturbation of the control variables. Determination of the sensitivity of a forecast response to variations in initial conditions, or the iterative determination of initial conditions that gradually reduces the values of a cost function, requires knowledge of the gradient of the forecast response or the cost function with respect to the control variables.
By differentiation of (1) and a little manipulation, Le Dimet and Talagrand (1986) showed how the gradient of Jo can be found through a backward integration from tR and t0 of the adjoint model, that is,
i1520-0493-129-11-2791-e3
where 𝗠 ≡ ∂M/∂x0 is the tangent linear model (TLM) operator of the original forecast model (called forward nonlinear model), and 𝗠T is the adjoint model operator. Based on the same assumption of differentiability of the model solution and Jo, the perturbation analysis provides two procedures to check the correctness of the tangent linear model and the gradient of Jo (Courtier and Talagrand 1987; Navon et al. 1992). Once the values of Jo and the gradient of Jo, ∇Jo, with respect to the control variables, can be calculated, as well as the total cost function J and its gradient ∇J, an unconstrained minimization algorithm (Liu and Nocedal 1989; Zou et al. 1993a) can be employed to find the “optimal” values of the control variables that minimize J.

When the assimilation model is diabatic, containing parameterized physics to account for the effects of the subgrid-scale processes (e.g., turbulence and cumulus convection) on the grid-resolvable scale state of atmosphere, Jo is discontinuous and nondifferentiable. These subgrid-scale physical processes in the atmosphere modify the local budget of mass, momentum, and energy and appear in the governing equations as source/sink terms. Usually, these processes depend strongly on local atmospheric conditions, that is, on the state in an individual grid-box column. These local conditions control “on–off” switches of the physical parameterizations. The on–off switches are governed by values of certain control variables or quantities derived from the control variables. If such a variable reaches a threshold value, these source/sink terms are turned on or off at a particular time step at a grid point.

Two examples are shown in Fig. 1, where Jo (denoted as Jmodel) is defined as a weighted sum of the 6-h forecast errors [see Eq. (21)]. For simplicity, the time filter coefficient (αA) and the horizontal diffusion coefficient (αHD) are chosen as the input control variables of Jmodel using an adiabatic (upper panels) and a diabatic (lower panels) National Centers for Environmental Prediction (NCEP) global spectral model, respectively. The values of Jmodel are calculated at an interval of 0.001 for αA and 0.001 × 1016 for αHD. The variation of Jmodel in the adiabatic case (Figs. 1a,c) shows a very smooth behavior in the control variable space, while the variation of Jmodel in the diabatic case is obviously nonsmooth.

Should we use the diabatic assimilation model in 4DVAR? A diabatic forecast model including parameterized physics usually simulates the evolution of the atmosphere better than an adiabatic one does. Theoretically, a more realistic assimilation model will produce better 4DVAR results than an adiabatic model if real observations are assimilated, not to mention the necessity of including model physics to assimilate some physical quantities such as clouds and precipitation. Therefore, efforts have been made to incorporate parameterized physics into 4DVAR experiments using primitive equation models (Zupanski 1993; Xiao et al. 2000; Zou et al. 1993b; Zou and Kuo 1996; Zhu and Navon 1999; Mahfouf and Rabier 2000). Numerical results in these studies showed not only reasonably convergences of minimization procedures in which differentiable minimizations are used, but also various levels of improvement in analyses and model forecasts. In these experiments, some researchers retain the on–off switches of parameterizations in the tangent linear and adjoint models, as in the forward nonlinear model (Zou et al. 1993b; Zou and Kuo 1996; Zou 1997; Xiao et al. 1999; Mahfouf and Rabier 2000, hereafter referred to as “classical” adjoint method). Others (Zupanski 1993; Zhu and Navon 1999) attempted to remove the on–off switches using a transitional smooth function to make the model solution and the cost function smooth. In contrast, Xu (1996a) demonstrated that the classical adjoint method fails to correctly evaluate the gradient of a cost function that contains on–off switches using a perturbation analysis starting from a simple analytic model with a stepfunction source term. He proposed a generalized adjoint formalism that accounts for the effect of on–off switches in parameterized physical processes (Xu 1996a,b). The generalized adjoint was then extended to a vector system (Xu 1996b), as well as to multiple threshold conditions (Xu 1997a) and the time discretization situation (Xu 1997b). Furthermore, a “coarse-grain” tangent linearization and adjoint was proposed to deal with on–off switches triggered at discrete time levels (Xu et al. 1998). Numerical results on minimization are presented using the Lorenz-63 model (Xu et al. 1999). However, implementing their approach in data assimilation using a primitive equation model including complicated physical parameterization schemes seems difficult and requires further study.

In this study, we indicate that the cost function containing discontinuous parameterizations is zeroth-order discontinuous (e.g., the function itself is discontinuous due to on–off switches in the forward model), and classical adjoint method provides the one-sided gradient of a cost function containing discontinuous parameterizations. This one-sided gradient, the limit of the ratio of vanishingly small perturbations of the cost function and control variable, can be used to construct a subgradient in a nondifferentiable optimization algorithm such as the bundle method (Lemarechal and Sagastizabal 1997; Zhang et al. 2000) for improving the minimization of discontinuous functions. We examine the behavior of such a discontinuous cost function in phase space. With the theoretical and numerical results from an adjoint model integration using the classical adjoint method, insights into the following questions are gained.

  1. What does the tangent linear model solution fail to represent in the presence of on–off switches?

  2. What does the result of an adjoint model integration represent, or fail to represent, in the presence of on–off switches?

  3. How can the correctness for both the tangent linear and adjoint models be checked in the presence of on–off switches?

  4. Why did a differential minimization algorithm still work well in many 4DVAR experiments that used a diabatic assimilation model with discontinuous physics?

This paper is arranged as follows. Section 2 uses an idealized simple model with a typical discontinuous source term to examine the behavior of the tangent linear and adjoint model solutions. A chain rule method is used to derive the gradient of a cost function at a continuous point, or a one-sided gradient at a discontinuous point, through an adjoint model integration. In section 3, a one-type cloud Arakawa–Schubert cumulus parameterization scheme employed in the NCEP global spectral model is used to investigate features of the forward model, cost function, TLM, and the adjoint model. Numerical results of the gradient, calculated using the adjoint of the NCEP diabatic model, are examined in section 4. Section 5 discusses the correctness check of the adjoint of the forward operator containing discontinuous physical processes. Summary and discussions are given in section 6.

2. Zeroth-order discontinuous cost function and its one-sided gradient calculation using classical adjoint

a. Zeroth-order discontinuous cost function

The governing equation of any numerical weather prediction (NWP) model can be written symbolically as
i1520-0493-129-11-2791-e4
where β represents a parameter vector that consists of physical parameters (such as the moistening parameter b in the Kuo scheme) and/or parameters introduced for computational stability (such as the filter coefficient). Numerical prediction of the state variable x is produced by numerical integration of a forecast system, which results from expanding the governing equations (4) onto a mesh system. When 𝗙(x, β) involves no discontinuous process (e.g., an adiabatic model), both the model solution and the cost function defined by the model solution are differentiable with respect to control variables x and/or β.

Consistent with the previous work (Xu 1996a,b), when the source/sink terms defined by parameterized physics are included in the assimilation model, 𝗙(x, β) of (4) contains on–off switches and is differentiable only between two neighboring switches. The model operator 𝗙(x, β) is thus piecewise differentiable. In this section, we first ensure that the model solution, which is a result of many timestep integrations and the cost function defined on it are zeroth-order discontinuous, that is, it is the function itself that is discontinuous. In other situations, the cost function may be continuous, but with first- or high-order discontinuous derivatives. For simplicity, without changing the essence of the problem, we use a simple single-variable piecewise differentiable model to examine the behavior of the model solution in the time domain and a cost function defined by this model in the phase space.

Consider a forecast model with the following form
i1520-0493-129-11-2791-e5
where f1(x) and f2(x) are different (i.e., f1(x) ≠ f2(x) when xxc) and both are differentiable functions. The similar stepfunction was used in the “generalized” adjoint study (Xu 1996a). The (classic) tangent linearized version of (5) takes the form of
i1520-0493-129-11-2791-e6
where f1 and f2 represent the differentiations of f1 and f2 with respect to x, respectively. Starting from an initial value x0 < xc and a perturbation Δx0 that crosses xc (i.e., x0 + Δx0xc) with a forward time integration scheme, the tangent linear and nonlinear changes of the perturbation through one time step integration are, respectively,
i1520-0493-129-11-2791-e7
where superscripts TL and NL denote a tangent linear and nonlinear growth of a small perturbation, respectively. Apparently, the tangent linear change ΔxTL1 cannot approximate the nonlinear growth ΔxNL1 when the small perturbation Δx0 crosses a threshold at the first time step, no matter how small it is. Does this mean that the classical adjoint method fails to evaluate the gradient of the cost function containing discontinuous physics? In order to answer this question, we use the chain rule method to illustrate that an adjoint model integration provides a one-sided gradient at a discontinuous point of the cost function.
Using the same simple model, and considering the whole domain of the model state x after two time step integrations, one has four possible values for the model state at time t0 + 2Δt:
i1520-0493-129-11-2791-e9

As the n-step time integration proceeds for this one threshold situation, the number of possible switches that could occur increases exponentially by the law of 2n. If the number of thresholds increases (say, k), the number of possible switches that were turned on and off will increase by the law of 2kn. The model state between two neighboring switches is differentiable. We indicate that in many situations, however, states at nearby grid points will be mutually compatible, and the effective number of possible different states will be smaller than 2kn.

A cost function defined as a continuous transformation of the model state in various times [usually a quadratic form such as the one in (11)] is also piecewise differentiable.

Let us consider a specific numerical example given by a forecast model
i1520-0493-129-11-2791-e10
and a series of cost functions
Jnx0x2tn
where tn = t0 + nΔt. Using t0 = 0 and Δt = 0.1, we obtain the distributions of Jn in the control variable (x0) space (Fig. 2). It is found that the cost functions consist of differentiable pieces between the discontinuity points. The longer the assimilation window (the more time integration steps), the more discontinuous points the cost function contains.
Without losing the generality, the on–off switches in parameterizations can be expressed as the form of “if” statements:
i1520-0493-129-11-2791-eq1
where a is a scalar function (which may also be a function of the model input) of the model input and ac is a threshold value of a.

Here, a = ac defines the locus of the discontinuities in input space. That locus will normally be a “hypersurface” with dimension m − 1, where m is the dimension of the input space (this picture may be complicated by various facts, and in particular by the fact that the functions a and ac, because of prior on–off processes, may themselves be discontinuous functions of the input). But if we accept that the locus of the discontinuities resulting from each individual on–off process is an (m − 1)-dimensional hypersurface, the hypersurfaces defined by all such processes in the model will partition the input space into m-dimensional cells, within each of which the cost function will be continuously differentiable. Therefore, we say, the zeroth-order discontinuous cost function defined on parameterized physics is piecewise differentiable.

b. Calculation of one-sided gradient of a zeroth-order discontinuous cost function by classical adjoint approach

The analysis of section 2a showed that the cost function defined on parameterized physics with on–off switches is piecewise differentiable. The classical adjoint approach individually conducts the linearization and transposition in each cell for coding the adjoint. The approach evaluates the gradient of the cost function at some given point in control variable space, say, x0, which is located within the certain cell, by the chain rule as
i1520-0493-129-11-2791-e12
where n is the dimension of the model state, and x0 = (x01, x02, … , x0n)T and x = (xt1, xt2, … , xtn)T represent the initial condition and the model state at time t, respectively. Here 𝗠 is a linear propagator derived from differentiating 𝗙(x, β), which consists of a product of tR differentiable subpropagators represented by each step time integration where tR denotes the total number of time step integrations, that is,
tRtR−11
Here
i1520-0493-129-11-2791-eq2
where each element in 𝗠t can be evaluated in individual cell. Therefore, we have
i1520-0493-129-11-2791-e14
that is, an integration of the adjoint model containing discontinuous physical parameterizations is able to evaluate the one-sided gradient of the cost function at a discontinuity point as accurately as it does at continuous points.

Figure 3 displays the variations of the gradients of the cost functions defined in (11) using the model (10) (the cost functions were shown in Fig. 2). These gradients, dJn/dx0, are calculated by the chain rule as in (12) for a one-dimensional control variable situation, in which the adjoint model of (10) is developed using the classical adjoint method, that is, keeping the on–off switches the same as in the nonlinear model. The gradients evaluated from the classical adjoint integration are found to be identical to the direct analytical solution.

The derivations and numerical examinations shown above suggest that the adjoint accuracy to evaluate the gradient of the cost function is identical to the accuracy of linearization that is conducted on the forward model for building the tangent linear model. In other words, wherever the tangent linearization is conducted, the adjoint is unambiguously defined. At either side of a discontinuity point, the adjoint approach numerically determines the one-sided gradient of the cost function as accurately as the linearization is conducted at either side. This is consistent with some theoretical understanding (Errico 1997). In practice, for a continuous function, one often uses a linear change along the tangent direction to approximately describe a finite nonlinear perturbation, that is, the tangent linear approximation. The analysis above shows that the accuracy of the calculated one-sided gradients has nothing to do with the invalidity of the tangent linear approximation for a finite nonlinear perturbation that crosses the discontinuity point, that is, ΔJ ≠ 𝗠Δx, and is irrelevant to the magnitude of the finite perturbations Δx. The adjoint approach deals with vanishingly small perturbations, while the tangent linear approximation deals with finite perturbations. The degree of accuracy of the tangent linear approximation says something about the applications for which a numerically computed gradient can be useful, but that is a distinctly different question.

Therefore, the classical adjoint approach does not work for finite perturbations in a nonlinear and discontinuous system, which is for computing finite difference ratios of the form ΔJxi, for a given J defined by a discontinuous model solution. In order to numerically determine such finite difference ratios in a discontinuous nonlinear system, one explicitly computes the perturbations ΔJ for each input perturbation Δxi. However, to deal with the spike of ΔJxi, some new approach such as the generalized adjoint (Xu 1996a,b; 1997a,b) has to be considered.

3. Numerical results using a one-type cloud Arakawa–Schubert cumulus parameterization

Sensitivity examinations of various parameterizations in the NCEP model showed that a simplified (one-type cloud) Arakawa–Schubert (hereafter referred to as OTC-AS) cumulus parameterization scheme (Pan and Wu 1995) causes the largest discontinuity of the model response (Zhang 2000). We therefore choose the OTC-AS cumulus parameterization scheme as a discontinuous model operator to investigate the features of the TLM and adjoint model solutions.

a. Typical discontinuities

The original Arakawa–Schubert cumulus parameterization scheme (Arakawa and Schubert 1974) describes an ensemble of all-type clouds. In the implementation of the NCEP global model, the scheme was simplified as a one-type cloud (Pan and Wu 1995). In the simplified scheme, a simple cloud model is used to describe the thermodynamical properties, as illustrated schematically by Fig. 4. The one-type cloud definition includes the thickness of the cloud/subcloud (updraft) layer that is greater/less than 150 hPa. This cloud/subcloud layer is determined by the vertical profiles of moist static energy and/or saturated moist static energy. Conditional instability defines the originating height of updraft (entrainment), where moist static energy reaches its maximum in the lower troposphere. At or above this height, a disturbed moist air parcel ascends adiabatically, due to buoyancy, until saturated where the cloud base is defined by lifting condensation level. Afterward, the moist air parcel will continuously rise along a constant moist static energy line (constant θse line), assuming there is no heat exchange between the air parcel and the environment. The cloud top is defined when the vapor in the air parcel condenses completely.

Fillion and Errico (1997) studied the accuracy of linearized convection operator and the resulting impact on minimization using a column model relaxed Arakawa–Schubert scheme. Our study examines the features of the solutions of the tangent linear model and the adjoint model of the discontinuous OTC-AS parameterization. Generally speaking, any “if” statement in a parameterization scheme is able to cause discontinuities in model response. In the OTC-AS parameterization, obvious discontinuities occur when the following physical considerations are implemented in the computer code.

  1. The convection process is turned on/off as the restriction of the thickness of the cloud layer and/or subcloud layer is reached/broken.

  2. The convection process is turned on/off as the cloud work functions cross their critical values.

  3. The level indices representing the cloud/subcloud base/top heights are shifted up/down if a small change occurs in temperature and moisture profiles.

In order to thoroughly investigate the impact of these discontinuities on the solutions of the TLM and the adjoint, we will examine the features of the nonlinear, tangent linear, and adjoint solutions of the OTC-AS parameterization in three cases. First, the input temperature at a single model-level of a sounding is changed (case 1). Second, the input temperature and moisture at all levels of a single sounding are varied (case 2). Finally, the input temperature and moisture are varied on multicolumns (case 3). The data used are extracted from the NCEP reanalysis dataset at 0000 UTC 1 November 1995.

b. Features of the nonlinear and tangent linear solutions

Figure 5 exhibits the numerical results of case 1 in which only the temperature at the second model level from the ground (σ = 0.9821) of a single column located at and latitude 1.875°N and longitude 87°E [corresponding to model grid (43, 47) in a 62-wavenumber triangle truncation spectral model] is varied using an interval of 0.005 K. The other input variables (specific humidity, wind speed, vertical velocity, and the surface pressure) are kept unperturbed. The model solution, which is the adjusted temperature increment resulting from the OTC-AS parameterization at this model level, has two discontinuous points around T = 300.15 K and T = 300.7 K, where the OTC-AS cumulus convection process is turned on and off, respectively (Fig. 5a). Using 0.005 K as a small perturbation of the input temperature, we examine the features of the nonlinear and tangent linear perturbation solutions (Figs. 5b, 5c) resulting from a small perturbation made to each input temperature shown in Fig. 5a. The perturbations in Figs. 5b and 5c are identical, except at the discontinuity points. Figure 5a shows that the adjusted temperature varies linearly in effect between the discontinuity points, so that the tangent linear model solution is nearly constant. At discontinuity points, Fig. 5b (nonlinear model solution) shows peaks in perturbations that are absent from Fig. 5c (TLM solution). These peaks correspond to cases where the unperturbed and perturbed input temperatures lie on different sides of the discontinuity. The tangent model is, of course, unable to produce the corresponding jumps in the adjusted temperature. However, except at the discontinuous points, the tangent linear model is as good for computing perturbations as the full model (with the first-order accuracy).

If more than one input variable is varied (cases 2 and 3, for instance), the following perturbing method and the response functions of the model (nonlinear and tangent linear) output are defined so that the relation of the nonlinear and tangent linear solutions and the input variable can still be presented graphically. The input temperature and specific humidity are varied by
i1520-0493-129-11-2791-e15
where ΔT = TrTu and Δq = qrqu. The subscripts u and r, respectively, denote the “unperturbed” state and “reference” state of temperature and specific humidity profiles. Again, the other input variables (wind speed, vertical velocity, and the surface pressure) are kept unperturbed. Once the unperturbed state and reference states are chosen, the parameter α is varied to control the magnitude of the input perturbations αΔT and αΔq so that a multidimensional problem is converted to a one-dimensional problem that can easily be examined graphically.
The following three scalars, R1, R2, and R3, are defined using l2 Euclidean norm to represent the temperature and moisture increments due to cumulus convection (R1), the nonlinear perturbation of the output temperature and moisture variables caused by a small change in the input variables (R2), and the tangent linear solution (R3), respectively:
i1520-0493-129-11-2791-e16
where C represents the nonlinear operator of the OTC-AS parameterization, I is a unity matrix, and the weighting coefficients used for the norm of temperature and specific humidity variables are 10−4 K−2 and 106 kg2 kg−2, corresponding to typical orders of simulated temperature and specific humidity errors of 1°K and 1 g kg−1;
R2αCαC2
where
i1520-0493-129-11-2791-e18
where 𝗖(α) = ∂C(α)/∂x is the tangent linear operator of the OTC-AS parameterization and Δx = (ΔT, Δq)T.

The features of various model solutions with respect to the multidimensional control variable can now be illustrated in a single variable space of α.

For the numerical test conducted below, we choose a vertical column where the convection process is inactive [the model grid (44, 47) at 1.875°N, 87°E] as a basic state (Tu and qu), and another vertical column where the convection process is active [the model grid (43, 47) at 1.875°N, 85°E] as a reference state (Tr and qr). We could predict that the nonlinear model response has at least one discontinuous point between α = 0 and α = 1 when the convection process is turned on.

Figure 6a shows the distribution of R1 with respect to α calculated at an interval of 0.01 (equivalent to a temperature interval of about 0.005 K and specific humidity interval of about 0.05 g kg−1). For each of these α values, we then add a perturbation of Δα = 0.01 to the input variables of the OTC-AS cumulus parameterization scheme. Results of the nonlinear perturbation and tangent linear perturbation are shown in Figs. 6b and 6c, respectively. From Fig. 6a, we found that when α varies from 0 to 1 the convective process is activated at α = 0.43, indicating the occurrence of discontinuity of the forward model solution in the phase space of α. Another discontinuity occurs near α = 1.28 when the cumulus cloud top moved from the 12th to 13th model level. The results are qualitatively similar to those shown in Fig. 5. Outside and between discontinuous points, R1 varies in effect linearly, so that the tangent linear model represents perturbations as accurately as the nonlinear model (Figs. 6c and 6b, respectively). At discontinuity points, the tangent linear model is unable to represent the sharp jumps in R1.

The above experiments were repeated by including all the columns (384 model grids) on two selected latitude belts, for example, a pair of Gaussian latitude circles symmetric with respect to the equator. The temperature and moisture profiles derived from the NCEP reanalysis on the pair of Gaussian latitude circles on 3.75°N(S) were used as Tu and qu and those on 1.875°N(S) as Tr and qr in (15). Numerical results of R1, R2, and R3 are shown in Fig. 7. The nonlinear model solutions (Fig. 7a) contain many discontinuous points. The results are, again, very similar to those shown in Figs. 5 and 6. The tangent linear model accurately represents the (quasi linear) variations of R1 between discontinuous points, but does not represent the jumps at those points. These results are consistent with the examination using idealized examples (Xu 1996a,b; 1997a,b).

c. Features of the adjoint model solutions

In order to examine the features of the adjoint model solution of the OTC-AS scheme, we first define a cost function
i1520-0493-129-11-2791-e20
where the norm ‖ ‖ is defined the same as in (16). We assume that the observations of the temperature and specific humidity (Tobs, qobs) are the output of the parameterized convection process operating on the reference state. The cost function and its gradient are evaluated for different α values with an interval of Δα = 0.01. The corresponding numerical results are shown in Fig. 8 for both JAS (dashed line) and ∇JAS (solid line). Within the range of α from 0 to 2, JAS has two discontinuity points (as described in Fig. 6). The gradient, calculated by the adjoint model integrations, ∇JAS, is also discontinuous at these two points. However, the slopes represented by the values of the adjoint-calculated gradient, for example, the thin solid lines crossing points P and Q represent the one-sided tangent directions of JAS on both sides at a discontinuity point. Results in Figs. 8 and 9 are consistent with the theoretical results of the adjoint model integration obtained in section 2 using the chain rule method. They may also explain why the minimization with active discontinuous physics still performed reasonably well in the work of Zou and Kuo (1996), Fillion and Errico (1997), and Fillion and Mahfouf (2000).

Using the same data as Fig. 7 (case 3), we compute JAS defined on 384 columns by (20) (thick-dashed line) and its gradient ∇αJAS (solid line) with different values of α using the interval Δα = 0.001 to generate Fig. 9. In this case, the “observations” (Tobs and qobs) are the temperature and moisture profiles after the convective adjustment on 384 columns on the latitudes 1.875°N(S). Similar to Fig. 7a, JAS contains many discontinuity points. Between two neighboring discontinuity points, however, JAS is continuous and differentiable. Therefore JAS is piecewise differentiable on the whole α domain. When the number of the discontinuity points increases, it becomes difficult to identify these differentiable pieces, and the cost function may appear to be oscillating with respect to a control parameter (see Figs. 1c,d). Compared to the variation of JAS, the distribution of gradient derived from the adjoint integration is rather smooth, with only one stationary point. Obviously, the adjoint model integration of a discontinuous physical parameterization still provides either a tangential slope at a continuity point (point A, for instance) or a one-sided tangential slope at a discontinuity point (point B or C, for instance). It is also remarkable to notice that the variation of the gradient with respect to α is in fact linear, which means that the variations of the cost function, in between its obvious discontinuous points, is quadratic, a nice feature that will make a minimization work well.

4. Examining the gradients calculated using the adjoint of the NCEP diabatic global spectral model

a. Description of discontinuous physical processes and selection of control parameters

The NCEP global spectral model consists of a set of prognostic equations, including the tendency equations of vorticity, divergence, virtual temperature, specific humidity, and the surface pressure (Sela 1982, 1987). The model physics include large-scale precipitation, cumulus convection, shallow convection, vertical diffusion, gravity wave drag, and surface and boundary processes. The large-scale (grid scale) condensation consists of condensation in supersaturated layers and evaporation in unsaturated layers. It adjusts both the temperature and specific humidity. The cumulus parameterization scheme is that described in section 3. The shallow convection is the Betts scheme (Betts 1986). A nonlocal vertical diffusion scheme is adapted (Hong and Pan 1996). The gravity wave drag parameterization is the Pierrehumbert (1986) scheme. The surface processes include a two-layer soil model. In the related processes the Monin–Obukhov similarity theory is applied to simulate the change of the surface (air–ground interface) temperature and humidity, and to calculate exchange coefficients and turbulence fluxes in the planetary boundary layer. These physical parameterization schemes are included in both the nonlinear and the adjoint models. Longwave and shortwave radiation are kept constant during the 6-h assimilation window.

The model dynamical core includes horizontal diffusion and time filtering in which the filter coefficient (αA) is taken as 1 − 2ϵ, where ϵ is the classical Asselin filter coefficient. The model carries out the time filter and horizontal diffusion at each time step integration. Although both the time filter coefficient αA and the horizontal diffusion coefficient αHD are not explicitly related to the model physical processes, any change in the values of these coefficients can affect model state and thus on–off switches in the parameterization schemes during the time integration. Therefore, these coefficients (αA or αHD) play equivalent roles to α in (15), where α controls the magnitude of perturbation of a multidimensional control variable such as initial condition. Then, any cost function (J) defined by the model solution and its gradient (∇J) can be evaluated using a series of αA or αHD values rather than perturbing initial conditions so that the features of the J and ∇J can be illustrated in αA or αHD space. Therefore, this study selects αA and αHD as the control variables to represent some numerical features of discontinuity in 4DVAR.

b. Examinations of the calculated gradient through the NCEP diabatic adjoint

In order to examine the gradient evaluated by the adjoint of a diabatic version of the NCEP model, we define the cost function using the model solution and the NCEP analysis as
i1520-0493-129-11-2791-e21
where 〈,〉 represents the dot product, and the l2 Euclidean norm, of two vectors. Here α represents the control variable vector and we choose α = (αA, αHD)T in this case. The xanalysis is the analysis state at 0600 UTC on 1 November 1995, representing observations yobs in (1), and x6-hforecast is the 6-h model forecast state [M(x0)] starting from the initial state (x0) at 0000 UTC on 1 November 1995. The model resolution is 62 waves (not including the zonal mean) in horizontal domain and 28 vertical levels. The state vector x consists of divergence (D), vorticity (ζ), virtual temperature (T), specific humidity (q), and the surface pressure (Ps), that is, x = (D, ζ, T, q, Ps)T. Here the observational operator H is a unity matrix in this case. Finally, 𝗪 is a diagonal weighting matrix approximated by the inverse of the absolute value of maximal 6-h differences of state variables. In this case, the diagonal values of 𝗪 are 1.2 × 10−6 s−1 for divergence, 2.4 × 10−6 s−1 for vorticity, 1 K for temperature, and 10 mb for surface pressure.

Variations of the cost function Jmodel defined by (21) and its gradient evaluated using the adjoint of the NCEP adiabatic (top panel) and diabatic (bottom panel) nonlinear forward models with respect to αA (left) and αHD (right) are shown in Figs. 1 and 10, respectively. The standard values of αA and αHD in the NCEP operational model are 0.92 and 0.3 × 1016, respectively. Values of both Jmodel and ∇Jmodel are computed at an interval of 0.001 for αA and 0.001 × 1016 for αHD. Comparing Figs. 1c,d with Figs. 10c,d, we found that, although the diabatic Jmodel appears jagged with respect to both coefficients due to the jumps in nonlinear model solution, the variation of the calculated gradients using the diabatic adjoint is not following the jagged aspect, but is consistent with the general convexity of Jmodel. There is only one point in the phase space at which the gradient is zero, a necessary condition for a minimum. These characteristics of the calculated gradient, on the one hand, confirm that an adjoint integration evaluates the one-sided gradient at a discontinuity point as correctly as it does the gradient at a differentiable point. In fact, Jmodel in Figs. 1c,d can be viewed as an extension of the JAS in Fig. 9, which is piecewise differentiable with many more discontinuities when all the discontinuous parameterization schemes are used in a global domain within a 6-h assimilation window (a total of 12 time steps using 30 min as the time integration step size). When the one-dimensional input space contains too many discontinuities, the cost function curve may appear “oscillating” with a finite interval of αA (αHD) while the corresponding gradient curve appears relatively smooth. The phenomenen may be case dependent because there is no guarantee that the cost function, which is produced by the diabatic model, is only with zero-order discontinuity. In addition, we notice that the variations of the gradient are more linear (and the cost function must be more quadratic) in the diabatic case than in the adiabatic case, implying that the Hessian matrix is nearly constant in the diabatic case. Further research is of course needed to examine the nature of a diabatic cost function and clarify these issues, especially when the model resolution is increased.

5. Correctness check of the TLM and adjoint over multiphysical regimes in the presence of discontinuities

Traditionally, the correctness of the TLM and adjoint model is checked by examining the first-order approximations to a nonlinear perturbation of both the model solution (TLM test) and the cost function (gradient test). Typically, a ratio (Φl for TLM test and Φg for gradient test) is defined to measure the discrepancy between the nonlinear perturbation and its first-order approximation as
i1520-0493-129-11-2791-e22
where ‖·‖ represents a l2 Euclidean norm, M the nonlinear forward operator, 𝗠 (=∂M/∂α) the tangent linear operator of M, J the defined cost function such as in (20) or (21), ∇αJ the gradient derived from an adjoint integration, and β is a scalar controlling the size of input perturbation. For a differentiable M or J, once the basic state α (usually an initial analysis) and the perturbation δα (usually taken as ∇J/‖∇J‖) are chosen, the ratio is expected to approach 1 linearly, with increasing accuracy as β becomes progressively smaller. Assuming Φl(β) = 1 + and β = 10i, the function log(Φl − 1) (=logAi) is a linear function of i.

In section 2, we proved that the adjoint integration provides the one-sided gradient for the cost function defined by the parameterized physics at a discontinuity point. In order to examine the conclusion numerically, we use the gradient of JAS [see (20)] evaluated by the adjoint of the OTC-AS parameterization column model at either side of a discontinuity point (points P and Q in Fig. 8) to conduct the gradient test. Figure 11 presents the variation of log(|Φg − 1|) with respect to log|β| for a left (−β) and right (+β) perturbation [the basic perturbation is taken as ΔT and Δq, see (15) for case 2]. It is found that, at the discontinuous point, the gradients computed by the adjoint of the OTC-AS parameterization are correct from either side, although the two gradients (at points P and Q in Fig. 8) are different.

Considering that the on–off switches in model parameterizations may produce lots of discontinuities in practice, and one does not always know where these discontinuities are, a test that conducts the computation of Φg over a range of basic states may provide more information to verify the usefulness of an adjoint. With the same technique used in Fig. 7, we can check the feature of the adjoint solution with respect to a set of basic states defined by the different α values. Figure 12 exhibits the features of the nonlinear change (the numerator of Φg expression; Fig. 12a) and the first-order approximation (the denominator of Φg expression; Fig. 12b) of a perturbed cost function using the OTC-AS parameterization and its adjoint, and their ratio (Φg; Fig. 12c). The cost function is defined by (20) and the basic state by (15) where the states on the latitudes of 1.875°N(S) and 3.75°N(S) are used as Tr, qr and Tu, qu through a 0.001 interval of α.

Figure 12 shows that the ratio of nonlinear to linear perturbations derived from the adjoint (Figs. 12a and 12b, respectively) leads to conclusions similar to those obtained from Figs. 5–7. Between discontinuous points, the ratio is close to 1, meaning that the linear approximation derived from the adjoint accurately describes the nonlinear perturbations of the cost function. This is true except in the vicinity of the point α = 1, where the gradient is equal to 0 and the ratio becomes infinite. And, at discontinuous points, the linear approximation does not capture the sharp variations of the cost function JAS.

6. Summary and conclusions

A diabatic forecast model including parameterized physics simulates the evolution of the atmosphere better than an adiabatic one does. Efforts have been made to incorporate parameterized physical processes into 4DVAR experiments to include satellite and radar measurements of rain rates and cloud properties and thus to enhance forecast skills. However, physical parameterizations include on–off switches that come from regime changes in the physical processes of the real atmosphere due to the numerical discretization. Many of them are hard to remove. Examples include the triggering of moist processes and different regimes in the surface processes. These on–off switches make the nonlinear model solution and the cost function defined in 4DVAR discontinuous.

By examining the numerical results from the tangent linear and adjoint operators of a simple idealized model, a simplified Arakawa–Schubert cumulus parameterization scheme, and the NCEP diabatic model, along with an analytical explanation for these calculated results, we illustrated that (i) the tangent linear approximation of a diabatic model is valid only in the differentiable interior of cells bounded in input spaces by discontinuity surface and that it could not represent the nonlinear perturbation growth when a perturbation crosses a discontinuity; (ii) the gradient calculated using the adjoint of a parameterization or a diabatic model correctly reflects a local tangential (or one-sided tangential) slope of the cost function, since the adjoint model integration represents a numerical implementation of the chain rule, using a complex NWP model as a compound transformation; and (iii) the test of the tangent linear and adjoint codes only using a basic state may be inadequate to ensure the correctness of codes in all physical regimes when the model is discontinuous and other checks based on different basic states may be recommended.

The calculated gradient using an adjoint model mainly serves as a descent direction in minimization (Navon and Legler 1987; Liu and Nocedal 1989; Zou et al. 1993a). A commonly used minimization algorithm, such as the limited-memory quasi-Newton algorithm, utilizes this information about local tangential slopes to define a search direction at each iteration. Therefore, the cost function in minimization always can be viewed as a smoothed version using a set of gradients. This smoothing process depends on the step size used in the line search and produces, somehow, a general convexity of the zigzag jumping cost function curve. The general convexity may mainly be governed by the adiabatic dynamics of the model. Therefore, the subgrid small-scale processes described by the parameterized physics may not hurt the general convexity of the cost function, although they make the cost function jagged around. This explains, to some degree, why the minimization algorithm that was developed for minimizing differentiable cost functions may still perform well for most 4DVAR problems using a diabatic assimilation model. Examples include assimilation experiments using precipitation (Zou and Kuo 1996) and precipitable water (Kuo et al. 1996) using a diabatic mesoscale model and its adjoint, and 1DVAR minimization using the relaxed Arakawa–Schubert scheme (Fillion and Mahfouf 2000). However, under certain specific situations, the limited-memory quasi-Newton algorithm may fail to minimize a piecewise differentiable cost function (Zhang et al. 2000). In the case, a nonsmooth optimization algorithm, such as the bundle method (Lemaréchal 1978), may improve the minimization (Zhang et al. 2000). Even a nonsmooth optimization algorithm using subgradient is still not free of problems (Zhang et al. 2000). In such a situation, further test and thought on generalized adjoint (Xu 1996a,b; 1997a,b; Xu et al. 1998; Xu and Gao 1999) may be one of potentials to solve the problem.

This study only discussed the case of the zeroth-order discontinuities in which the cost function under consideration is itself discontinuous. Higher-order discontinuities of situations when the cost function is continuous, but one of its derivatives is discontinuous, may also be present in numerical models of the atmospheric flow. The investigation of the impact of higher-order discontinuities on variational data assimilation, especially for meso- and convection-scale analysis, requires more research work to be done.

This study only examined some of the numerical results from tangent linear and adjoint models of either simple models or a coarse-resolution global prediction model. The investigation of the behavior of the cost function and its minimization with discontinuities for meso- and convection-scale analysis is required and planned for future work.

Acknowledgments

The research is supported by NOAA Grant NA77WA0571 and NSF Grant ATM-9812729. The authors would like to thank Drs. E. Kalnay and J. Sela for their persistent support and encouragement, and Dr. Olivier Talagrand and Dr. Qin Xu for their suggestions that were useful for improving the original manuscript. Thanks go to two anonymous reviewers for thorough and helpful comments and suggestions.

REFERENCES

  • Arakawa, A., and W. H. Schubert, 1974: Interaction of a cumulus ensemble with the large-scale environment. Part I. J. Atmos. Sci, 31 , 674701.

    • Search Google Scholar
    • Export Citation
  • Betts, A. K., 1986: A new convective adjustment scheme. Part I: Observational and theoretical basis. Quart. J. Roy. Meteor. Soc, 112 , 677691.

    • Search Google Scholar
    • Export Citation
  • Courtier, P., and O. Talagrand, 1987: Variational assimilation of meteorological observations with the adjoint equation—Part I. Numerical results. Quart. J. Roy. Meteor. Soc, 113 , 13291347.

    • Search Google Scholar
    • Export Citation
  • Errico, R. M., 1997: What is an adjoint model? Bull. Amer. Meteor. Soc, 78 , 25772591.

  • Fillion, L., and R. Errico, 1997: Variational assimilation of precipitation data using moist convective parameterization schemes: A 1D-Var study. Mon. Wea. Rev, 125 , 29172942.

    • Search Google Scholar
    • Export Citation
  • Fillion, L., and J-F. Mahfouf, 2000: Coupling of moist-convective and stratiform precipitation processes for variational data assimilation. Mon. Wea. Rev, 128 , 109124.

    • Search Google Scholar
    • Export Citation
  • Hong, S. Y., and H. L. Pan, 1996: Nonlocal boundary-layer vertical diffusion in a medium-range model. Mon. Wea. Rev, 124 , 23222339.

  • Kuo, Y-H., X. Zou, and Y-R. Guo, 1996: Variational assimilation of precipitable water using a nonhydrostatic mesoscale adjoint model. Part I: Moisture retrieval and sensitivity experiments. Mon. Wea. Rev, 124 , 122147.

    • Search Google Scholar
    • Export Citation
  • Le Dimet, F. X., and O. Talagrand, 1986: Variational algorithms for analysis and assimilation of meteorological observations: Theoretical aspects. Tellus, 38A , 97110.

    • Search Google Scholar
    • Export Citation
  • Lemaréchal, C., 1978: Nonsmooth optimization and descent methods. International Institute for Applied System Analysis, Laxenburg, Austria, 25 pp.

    • Search Google Scholar
    • Export Citation
  • Lemaréchal, C., and C. Sagastizabal, 1997: Variable metric bundle methods: From conceptual to implemetable forms. Math. Programm, 76 , 393410.

    • Search Google Scholar
    • Export Citation
  • Liu, D. C., and J. Nocedal, 1989: On the limited memory BFGS method for large scale optimization. Math. Programm, 45 , 503528.

  • Mahfouf, J-F., and F. Rabier, 2000: The ECMWF operational implementation of four-dimensional variational assimilation Part II: Experimental results with improved physics. Quart. J. Roy. Meteor. Soc, 126 , 11711190.

    • Search Google Scholar
    • Export Citation
  • Navon, I. M., and D. M. Legler, 1987: Conjugate-gradient methods for large-scale minimization in meteorology. Mon. Wea. Rev, 115 , 14791502.

    • Search Google Scholar
    • Export Citation
  • Navon, I. M., X. Zou, J. Derber, and J. Sela, 1992: Variational data assimilation with an adiabatic version of the NMC spectral model. Mon. Wea. Rev, 120 , 14331446.

    • Search Google Scholar
    • Export Citation
  • Pan, H. L., and W. S. Wu, 1995: Implementing a mass flux convection parameterization package for the NMC medium-range forecast model. NMC/NOAA/NWS Tech. Rep., 409, 40 pp.

    • Search Google Scholar
    • Export Citation
  • Pierrehumbert, R. T., 1986: An essay on the parameterization of orographic gravity wave drag. [Available from GFDI/NOAA, Princeton University, Princeton, NJ 08542.].

    • Search Google Scholar
    • Export Citation
  • Sela, J., 1982: The NMC spectral model. NOAA Tech. Rep. NWS 30, 36 pp.

  • Sela, J., 1987: The new T80 NMC operational spectral model. Preprints, Eighth Conf. Numerical Weather Prediction, Baltimore, MD, Amer. Meteor. Soc., 312–313.

    • Search Google Scholar
    • Export Citation
  • Xiao, Q., X. Zou, and Y-H. Kuo, 2000: Incorporating the SSM/I-derived precipitable water and rainfall rate into a numerical model: A case study for the ERICA IOP-4 cyclone. Mon. Wea. Rev, 128 , 87108.

    • Search Google Scholar
    • Export Citation
  • Xu, Q., 1996a: Generalized adjoint for physical processes with parameterized discontinuities. Part I: Basic issues and heuristic examples. J. Atmos. Sci, 53 , 11231142.

    • Search Google Scholar
    • Export Citation
  • Xu, Q., 1996b: Generalized adjoint for physical processes with parameterized discontinuities. Part II: Vector formulations and matching conditions. J. Atmos. Sci, 53 , 11431155.

    • Search Google Scholar
    • Export Citation
  • Xu, Q., 1997a: Generalized adjoint for physical processes with parameterized discontinuities. Part III: Multiple threshold conditions. J. Atmos. Sci, 54 , 27132721.

    • Search Google Scholar
    • Export Citation
  • Xu, Q., 1997b: Generalized adjoint for physical processes with parameterized discontinuities. Part IV: Problems in time discretization. J. Atmos. Sci, 54 , 27222728.

    • Search Google Scholar
    • Export Citation
  • Xu, Q., and J. Gao, 1999: Generalized adjoint for physical processes with parameterized discontinuities. Part VI: Minimization problems in multidimensional space. J. Atmos. Sci, 56 , 9941002.

    • Search Google Scholar
    • Export Citation
  • Xu, Q., J. Gao, and W. Gu, 1998: Generalized adjoint for physical processes with parameterized discontinuities. Part V: Coarse-grain adjoint and problems in gradient check. J. Atmos. Sci, 55 , 21302135.

    • Search Google Scholar
    • Export Citation
  • Zhang, S., 2000: Use of adjoint physics in 4D VAR with the NCEP Global Spectral Model. Ph.D. dissertation, The Florida State University, 185 pp.

    • Search Google Scholar
    • Export Citation
  • Zhang, S., X. Zou, J. Ahlquist, I. M. Navon, and J. G. Sela, 2000: Use of differentiable and nondifferentiable optimization algorithms for variational data assimilation with discontinuous cost functions. Mon. Wea. Rev, 128 , 40314044.

    • Search Google Scholar
    • Export Citation
  • Zhu, Y., and I. M. Navon, 1999: Impact of key parameters estimation on the performance of the FSU spectral model using the full physics adjoint. Mon. Wea. Rev, 127 , 14971517.

    • Search Google Scholar
    • Export Citation
  • Zou, X., 1997: Tangent linear and adjoint of “on-off” processes and their feasibility for use in 4-dimensional variational data assimilation. Tellus, 49A , 331.

    • Search Google Scholar
    • Export Citation
  • Zou, X., and Y-H. Kuo, 1996: Rainfall assimilation through an optimal control of initial and boundary conditions in a limited-area mesoscale model. Mon. Wea. Rev, 124 , 28592882.

    • Search Google Scholar
    • Export Citation
  • Zou, X., I. M. Navon, M. Berger, P. K. H. Phua, T. Schlick, and F. X. LeDimet, 1993a: Numerical experience with limited-memory quasi-Newton and truncated-Newton methods. SIAM J. Optimization, 3 , 582608.

    • Search Google Scholar
    • Export Citation
  • Zou, X., I. M. Navon, and J. G. Sela, 1993b: Variational data assimilation with moist threshold processes using the NMC spectral model. Tellus, 45A , 370387.

    • Search Google Scholar
    • Export Citation
  • Zupanski, D., 1993: The effects of discontinuities in the Betts–Miller cumulus convection scheme on four-dimensional variational data assimilation in a quasi-operational forecasting environment. Tellus, 45A , 511524.

    • Search Google Scholar
    • Export Citation

Fig. 1.
Fig. 1.

Variation of the cost function Jmodel defined as a weighted sum of the 6-h squared forecast errors [see (21)] using the NCEP adiabatic (top) and diabatic (bottom) global spectral models with a time filter coefficient αA (left) and a horizontal diffusion coefficient αHD (right) (see section 4). The cost function is evaluated through integrating the nonlinear models using 0.001 and 0.001 × 1016 as the intervals of αA and αHD

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Fig. 2.
Fig. 2.

Variation of the cost function Jn of a single-variable discontinuous model [see (10) and (11)]. The number “n” represents the total time steps in the assimilation window

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Fig. 3.
Fig. 3.

Same as Fig. 2 except for gradient

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Fig. 4.
Fig. 4.

A schematic diagram of the OTC-AS cumulus parameterization scheme

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Fig. 5.
Fig. 5.

Variation of (a) the temperature adjustment (T2) on the second model level using the OTC-AS parameterization scheme due to a perturbation of ΔT = 0.005 K added to the input temperature on the same level, (b) The nonlinear perturbation solution, and (c) the tangent linear approximation introduced by an input perturbation of temperature 0.005 K at the second model level

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Fig. 6.
Fig. 6.

Variation of the l2 Euclidean norm of (a) the adjusted temperature and specific humidity [R1, see (16)] of the OTC-AS parameterization at a column around 1.875°N, 87°E calculated at an interval of Δα = 0.01, (b) the nonlinear perturbation solution produced by an input perturbation of Δα = 0.01 [R2, see (17)], and (c) the corresponding tangent linear approximation to the nonlinear perturbation [R3, see (18)]

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Fig. 7.
Fig. 7.

Same as Fig. 6 except for all the 384 columns around latitude 3.75°N(S) and Δα = 0.001

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Fig. 8.
Fig. 8.

Variation of the cost function (JAS) defined by (20) (thick dashed line), and the gradient (∇αJAS) (thick solid line) calculated from the adjoint of OTC-AS parameterization. A straight line crossing point P represents a local tangential slope derived from the adjoint. For graphing, the gradient is multiplied by a factor of 0.5. Both JAS and ∇αJAS are calculated at an interval of α = 0.01

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Fig. 9.
Fig. 9.

Same as Fig. 8 except that the OTC-AS parameterization scheme is used on 384 columns around latitude 3.75°N(S) where the input state is varied through (15) using Δα = 0.001; observations are defined as the model solution of the OTC-AS parameterization on those columns around latitude 1.875°N(S) and a factor of 10−2 used for graphing the gradient. A zero-gradient line (dotted line) is marked as a reference

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Fig. 10.
Fig. 10.

Same as Fig. 1 except for gradient. A zero-gradient line is marked as a reference (dotted line) for each panel

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Fig. 11.
Fig. 11.

The variation of log10(|Φg − 1|) for the OTC-AS parameterization column model with −β (testing the left derivative) and +β (testing the right derivative) at a discontinuity point (point P in Fig. 8). The definition of the cost function and data used are same as Fig. 8. The basic perturbation is taken as ΔT and Δq [see Eq. (15)]

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Fig. 12.
Fig. 12.

Variation of (a) the nonlinear change of the perturbed cost function (JAS) produced by Δα = 0.001 [see (15)], (b) the leading order (linear) approximation derived from the adjoint of the perturbation of JAS, and (c) the ratio of the nonlinear change and the linear approximation, with α

Citation: Monthly Weather Review 129, 11; 10.1175/1520-0493(2001)129<2791:EONRFT>2.0.CO;2

Save