## Abstract

Sensitivity analysis (SA) generally refers to an assessment of the sensitivity of the output(s) of some complex model with respect to changes in the input(s). Examples of inputs or outputs include initial state variables, parameters of a numerical model, or state variables at some future time. Sensitivity analysis is useful for data assimilation, model tuning, calibration, and dimensionality reduction; and there exists a wide range of SA techniques for each. This paper discusses one special class of SA techniques, referred to as variance based. As a first step in demonstrating the utility of the method in understanding the relationship between forecasts and parameters of complex numerical models, here the method is applied to the Lorenz'63 model, and the results are compared with an adjoint-based approach to SA. The method has three major components: 1) analysis of variance, 2) emulation of computer data, and 3) experimental–sampling design. The role of these three topics in variance-based SA is addressed in generality. More specifically, the application to the Lorenz'63 model suggests that the *Z* state variable is most sensitive to the *b* and *r* parameters, and is mostly unaffected by the *s* parameter. There is also evidence for an interaction between the *r* and *b* parameters. It is shown that these conclusions are true for both simple random sampling and Latin hypercube sampling, although the latter leads to slightly more precise estimates for some of the sensitivity measures.

## 1. Introduction

In contemporary times it is commonplace to represent complex systems with numerical models. Examples include numerical weather prediction models (Richardson 2007), ocean circulation models (Miller 2007), and hydrology models (Rushton 2003). All of these models generally consist of a system of partial differential equations that are numerically integrated and subject to boundary and initial conditions. Generally, such complex models can be viewed as a “black box” with some number of inputs and outputs. Although the choice of the inputs and outputs depends on the specific problem at hand, this paper focuses on model parameters and forecast quantities, respectively. Numerical models often have a large number of parameters whose values are not unambiguously known or even knowable, and so, it is useful to know how the parameters affect the forecasts.

Sensitivity analysis (SA) is the name broadly associated with such questions, although it is performed for a variety of reasons, including variable/input selection, dimensionality reduction, data assimilation, and model tuning or calibration (Cacuci 2003). These different applications of SA are not necessarily mutually exclusive, but this paper focuses on the former. Specifically, the focus here is on the extent to which the outputs are affected by the various inputs across the full range of input values. In statistics “the full range of input values” is called the experimental region, and the question of how to choose it is a topic of text books on experimental design (Douglas 2005). It is important in SA because the sensitivity of the outputs to the inputs can depend on the experimental region. It is unfeasible to examine the full experimental region, and so, sampling that region is usually the practical alternative. Two sampling schemes commonly employed in SA are simple random sampling and systematic sampling, reviewed below.

Given the long history of SA, there exists a wide range of methods for implementing it. One of the more intuitive is referred to as the one-at-a-time (OAT) method (Saltelli et al. 2008, 2010). Generally, in an OAT analysis the inputs are varied one at a time, while all other inputs are held fixed at some value (e.g., at their respective mean), and the change in the output is monitored. However, if the number of inputs is large, the basic OAT approach samples only a small portion of the experimental region. This can be seen as follows: consider three inputs whose values vary along the *x*, *y*, *z* Cartesian coordinates. Varying them one at a time, will sample the points along the axes, but not at the corners of a cube centered at the origin. In three dimensions this is not a serious concern because one often assumes that the black box model is some relatively smooth surface, so that knowledge of its output values along the three axes is sufficient to define it uniquely. However, it is a geometric fact that a high-dimensional space consists mostly of corners (Jimenez and Landgrebe 1998; Scott 1992), and so the basic OAT severely undersamples the experimental region. There exists a variation on the basic OAT, proposed by Morris (1991), which avoids both of these problems, but it will not be discussed here.

The taxonomy of SA methods is complex (Bolado-Lavin and Badea 2008), but one can divide them into two broad categories: local and global. Local methods yield sensitivity results that are valid only in a small region of the experimental region. The basic OAT approach is a local method because its sensitivity results are reliable only in the vicinity of the fixed values assigned to the inputs. Adjoint methods are also local, because they generally rely on the notion of a derivative or Jacobian of the output with respect to an input (Errico 1997). By contrast, global methods allow an examination of sensitivities across the full experimental region. The main advantage of local methods is their speed and transparency, while the main advantage of global methods is their ability to assess the effect of large changes in the inputs.

Although global methods themselves can be subdivided into finer categories, one class is based on a decomposition of the variance of the output(s) into terms corresponding to the different inputs and their interactions. In this way, such variance-based SA methods can assess the manner in which the uncertainty in an output is apportioned across the inputs, and across interactions between them. Variance-based methods are global in the sense that the sensitivity results do not pertain to any specific value of the inputs, and they are multivariate in that they can assess individual inputs and their interactions. This generality of variance-based SA methods is the reason why it is the main focus of this work.

Variance-based SA involves three ingredients. In the first, classical concepts from analysis of variance (ANOVA) are employed to define measures of sensitivity. The second ingredient involves estimation of conditional expectations in terms of which all of the sensitivity measures are written. The estimation is based on data obtained from running the numerical model for a sample of model parameters; methods of experimental design are used for deciding what type of sample to take—the third ingredient. All three ingredients are discussed below. The next subsection provides brief reviews of a number of SA applications in the geosciences.^{1} It is followed by a section further describing the three ingredients of variance-based SA. Section 3 applies that methodology to the Lorenz'63 model, and compares the results with those obtained from an adjoint-based approach to SA. A summary of the conclusions and their discussion is presented in section 4. The main goals of the paper are 1) to review and promote the use of variance-based SA methods, and 2) to apply the method to the Lorenz'63 model both as a demonstration of the SA method, as well as gaining a better understanding of the model itself.

### Past applications of SA

Stein and Alpert (1993) discuss the limitations of OAT analysis in atmospheric numerical models. They show that difference maps do not show interactions between model parameters. For example, they examine the effect of topography and surface fluxes on rainfall, and show that the interaction between the two factors contributes more to rainfall than either of the factors alone.

Alapaty et al. (1997) study the effect of five parameters on the boundary layer structure. Although that work relies on an OAT approach, in a follow-up work Niyogi et al. (1999) extended the analysis by using ANOVA to assess the simultaneous effect of the parameters on sensitivity measures.

Murphy et al. (2004) perform an OAT analysis on 29 parameters. The values of the parameters are selected according to expert advice. The forecast quantities considered are the standard deviation (across ensemble) of temperature, precipitation, and pressure. In a similar climate study, Sanderson (2011) performs a comparison of two climate models in terms of the distribution of the values of climate sensitivity (i.e., the change in mean temperature resulting from a doubling of CO_{2} concentration), with respect to four parameters. In one model they consider four parameters, each sampled at three levels; in statistics, such a design is called a 3^{4} factorial design. Mishra et al. (2010) examine the effect of six parameters on streamflow. They perform an OAT analysis, followed by a 2^{8} factorial analysis.

The idea of sampling different model parameters and/or different initial conditions arises also in ensemble prediction systems, where the main focus is on sampling the initial conditions so as to maximize the variance (across the ensemble) of the forecasts. There exist different methods for generating these samples. Some are designed to optimize the manner in which perturbations grow in time; methods based on singular and bred vectors belong to that class. Magnusson et al. (2008) compare these two methods for sampling initial perturbations. For the purpose of sampling model parameters, however, one often considers random or systematic sampling techniques. Hacker et al. (2011), for example, use a variant of the latter, called Latin hypercube sampling, to set the values of four model parameters in a mesoscale model. They also perform two types of SA; in the first, they examine the effect of each of the four parameters on the mean absolute difference between a forecast where the model parameter is set to a minimum value and another forecast where the model parameter is set to a maximum value. In other words, they follow an OAT, 2^{4} factorial design. They also study the effect of the model parameters on the mean and variance of the forecasts across the spatial domain.

Many of the basic ideas in SA arise in the calibration of ensembles, as well. For example, Golaz et al. (2007) consider 10 parameters in a single-column model parameterizing boundary layer clouds. They take simple random samples from a uniform distribution centered on specified initial values of the parameters, and then optimize the parameters by minimizing a squared-norm cost function. They highlight how such an analysis can help in identifying structural errors in the model. Although similar to SA, this type of analysis is different both in the ultimate goal of the analysis and in the technical implementation. As an example of the latter, note that calibration requires a cost function comparing model output with observations, but SA does not require observations at all.

The impact of observations can also be assessed through SA methods (Torn and Hakim 2008). Hakim and Torn (2008) propose a method, called ensemble synoptic analysis, which offers yet another means of performing SA suitable for assessing sensitivity to initial conditions. Gombos and Hansen (2008) apply the ensemble synoptic analysis method to potential vorticity forecasts from the Weather Research and Forecasting (WRF) Model, and compare the results of that statistical technique with those of a dynamic approach due to Davis and Emanuel (1991). Ancell and Hakim (2007) compare the ensemble approach to SA with the more traditional adjoint-based approach, and show that the two methods are equivalent when the initial conditions are spatially uncorrelated. Beyond this simple comparison, which is proven analytically, further comparisons are complex, and so the two methods have their respective advantages and disadvantages.

A variance-based SA of a numerical model in the geosciences is performed by Zhao and Tiede (2011). The numerical model examined there involves five parameters, and the response is changes in gravity measured at the earth's surface in the vicinity of a volcano; the model parameters are randomly sampled from a uniform distribution.

The “data” generated by all such studies are called computer data, and are further discussed in section 2c. One of the earliest general treatments of computer data, at least in meteorological circles, dates back to 1993 (Bowman et al. 1993). That work demonstrates the importance of proper sampling in assessing the effects of five parameters in a general circulation model.

## 2. Variance-based SA

The specific formulation of variance-based SA presented here follows that of Oakley and O'Hagan (2004). The foundations of the variance-based approach are based on two mathematical facts. The first is the variance-decomposition formula, also known as the law of total variance (Weiss 2005, p. 385):

where *E*[·] and *V*[·] denote expected value and variance, respectively. Here *E*[*y* | *x _{i}*] and

*V*[

*y*|

*x*] denote the conditional expected value and conditional variance, respectively, of the output, given the inputs

_{i}*x*; here,

_{i}*i*refers to the

*i*th input. Intuitively, this decomposition states that the total variance in

*y*,

*V*[

*y*], can be written as the sum of two terms, one measuring the variance “between” the conditional means, and the other measuring the mean of the conditional (“within”) variances.

The second fact is often known as the high-dimensional model representation (Santner et al 2003; Sobol 1993); it states that any function of the type, *y* = *η*(*x*_{1}, *x*_{2}, …, *x _{n}*), can be decomposed as follows:

where

The *z _{i}*(

*x*) are referred to as main effects, and the

_{i}*z*(

_{ij}*x*,

_{i}*x*) are called the first-order interactions. The dots “…” in Eq. (2) indicate that there exist higher-order interaction terms in the expansion, but here they are assumed to be relatively small. The approximation in Eq. (2) is adequate for a wide range of functions, but has been well studied for piecewise continuous functions (Chowdhury et al. 2008). Intuitively,

_{j}*η*represents the function mapping the model parameters to the forecast quantity of interest. It is important to point out that the computation of the expected values and variances requires the joint probability distribution of all the inputs. As a result, even a main effect computation for a given input generally involves the other inputs.

### a. Measures of sensitivity

The measure of importance for an input is a user-dependent quantity, but a few common measures natural to the formulation of the variance-based methods are as follows (Oakley and O'Hagan 2004). One natural measure gauges the expected reduction in the variance of the output, given an input: *E*[*V*[*y*] − *V*[*y* | *x _{i}*]]. This measure, denoted

*V*, is equal to

_{i}*V*[

*y*] −

*E*[

*V*[

*y*|

*x*]], which according to Eq. (1) can be written as

_{i}Similarly, the expected reduction in the variance of the output, given two inputs, *x _{i}* and

*x*, is

_{j}The quantity *V _{i}*

_{+j}measures the sensitivity of the output with respect to

*both x*and

_{i}*x*. From Eqs. (2) to (6) it follows that

_{j}*V*

_{i}_{+j}=

*V*+

_{i}*V*+

_{j}*V*[

*z*(

_{ij}*x*,

_{i}*x*)], and so, a measure that gauges the level of interaction between

_{j}*x*and

_{i}*x*[at least when

_{j}*z*(

_{i}*x*) is independent of

_{i}*z*(

_{j}*x*)] can be defined as

_{j}Another useful quantity measures the uncertainty remaining in the output, given all inputs, except *x _{i}*. For example,

measures the remaining–unexplained variance in *y* after all inputs have been fixed, except *x*_{1}. Traditionally, the *V _{i}* and measures are converted to proportions, as follows:

where *S _{i}* and are called the main effect index and the total effect index, respectively. Although they measure different facets of the importance of the

*i*th input, they do not capture interactions between the inputs. Therefore, it is important to supplement

*S*and with

_{i}*V*

_{i}_{+j}or

*V*for a more complete assessment of sensitivity. Table 1 provides a summary of these measures and their meaning.

_{ij}In special cases, these sensitivity measures can be related to other, common measures of sensitivity. For example, it can be shown that for *y* = *η*(*x*_{1}, *x*_{2}) = *β*_{0} + *β*_{1}*x*_{1} + *β*_{2}*x*_{2}, if *x*_{1} and *x*_{2} are independent and centered (i.e., with *E*[*x _{i}*] = 0), then and . In other words,

*V*and

_{i}*S*are directly related to the “regression” coefficients

_{i}*β*. Other sensitivity measures have more complex relations to the

_{i}*β*parameters (Marzban 2011).

_{i}### b. Estimation of E*[output|input]*

All of the measures in Table 1 can be computed from the *z _{i}* and

*z*defined in Eqs. (3) and (4), which are written in terms of conditional expected values. A great deal of the work in variance-based SA methods is focused on efficient and accurate ways for estimating these conditional expectations from data. The data themselves are generated by evaluating the function

_{ij}*η*(·) for some set of

*x*values. The choice of the

_{i}*x*values is a complex issue belonging to the realm of experimental design, described in the next section.

_{i}The methods for estimating the conditional expectations are varied, but they can be broadly divided into Monte Carlo methods (Sobol 1993; Saltelli et al. 2008, 2010), and methods based on emulation, or meta models (Busby 2009; Oakley and O'Hagan 2004; Rougier 2008; Rougier et al. 2009). In the former, the conditional expected values are expressed as their defining integral form, which are then evaluated using Monte Carlo techniques. The latter methods aim to develop a statistical model that approximates *y* = *η*(·). The resulting statistical model is said to emulate the black box model *η*(·). The emulator is then employed to estimate the conditional expectations.

The development of an emulator is a complex and sophisticated procedure. Oakley and O'Hagan (2004) develop a Gaussian process emulator, whose mathematics is similar to Kriging (Chaloner and Verdinelli 1995; Sacks et al. 1989a,b; Welch et al. 1992). More recently, these methods have been extended to allow multiple outputs (Rougier 2008; Rougier et al. 2009).

The estimation method adopted in this paper is a simple emulation approach based on cubic polynomial regression. Recall that a least squares fit to data of the form (*y*, *x*_{1}, *x*_{2}, …) has the property that the fitted value estimates the conditional expected value of the response *y*, given the predictors *x*_{1}, *x*_{2}, … (Bishop 1996, 201–202). Specifically, here, *E*[*y*|*x*_{1}] is estimated by fitting a cubic polynomial through data on *x*_{1} and *y*. Similarly for *E*[*y* | *x*_{2}], and *E*[*y* | *x*_{1}, *x*_{2}], etc.

### c. Experimental design

Experiments involving numerical models generate data that have no experimental error. In other words, a unique set of values for the inputs will always produce the same output. Such data are called *computer data,* and experiments involving computer data are often called in silico—in contrast to in vitro or in vivo experiments performed, respectively, in a laboratory tube or in a living body. Again, the distinguishing characteristic of computer experiments is the absence of experimental error (Fang et al. 2006; Santner et al. 2003). Consequently, the analysis of computer data is somewhat different from that of “real” data. The framework for analyzing computer data is well established (Santner et al. 2003).

In a computer experiment all of the variability of the response is due to variability in the factors. In other words, for computer data one has *V*[*y* | *x*_{1}, *x*_{2}, …, *x _{n}*] = 0, where

*n*is the number of inputs. The variability in

*y*that is utilized to define all the sensitivity measures in Table 1 originates from the variability of the inputs that are

*not*fixed. Either way, there

*is*variability in

*y*even for computer data, so care must be taken in sampling, because the precision of the sensitivity measures depends on the choice of the sampling scheme.

Discretizing the inputs on a grid is a form of sampling, but it is inefficient. For example, consider a two-dimensional grid discretizing the space spanned by two inputs. Then, any two grid points have at least one input in common. As such, the empirical region is not adequately sampled. An alternative to a grid is random sampling of the experimental region. There exists a wide variety of sampling techniques (Douglas 2005), but only two are considered here: simple random sampling (SRS) and Latin hypercube sampling (LHS). The latter is a systematic sampling method that belongs to a class of sampling techniques called *space filling*. The pros and cons of all of these sampling methods have been thoroughly examined (Cioppa and Lucas 2007; Fang et al. 2006; Santner et al. 2003; Urban and Fricker 2010).

SRS has the desirable property of leading to the most precise estimate of the mean, if the population is homogeneous. For example, in sampling a continuous quantity such as height, if the population of heights has no clusters that distinguish different height characteristics, then a mean height computed from an SRS has the smallest variance across multiple samples (if multiple samples were taken). By contrast, when the population is not homogeneous, LHS is designed to yield no-less precise estimates than SRS.^{2} The reason for this difference between the two sampling schemes is that SRS has a tendency to generate clusters. This clustering of data is not of concern when the population is homogeneous, but otherwise results in suboptimal estimates. LHS, however, is designed to scatter the sample across the full experimental region. This latest property is the reason why LHS is said to be *space filling* (Cioppa and Lucas 2007; Fang et al. 2006; Santner et al. 2003).

Figure 1 shows an example of a SRS (open circles) and a LHS (filled circles) taken from a two-dimensional experimental region. This specific realization is uncharacteristic in that the SRS circles clearly cluster together, while the LHS circles do not. However, it serves to demonstrate that SRS may lead to clusters, but LHS cannot. Indeed, by design, no two cases in the LHS have the same values of *x*_{1} and *x*_{2}.

## 3. Lorenz'63

Marzban (2011) considers a few examples that can be solved analytically. One example in which the function representing the black box model is not available in analytic form is the Lorenz'63 model (Lorenz 1963). It is defined as

where the model parameters are *s* (the Prandtl number), *r* (the Rayleigh number), and *b*, the latter being a function of the wavenumber. The state space variables *X*, *Y*, and *Z* measure the intensity of convective motion, and horizontal and vertical temperature variation, respectively (Bellomo and Preziosi 1995).

In terms of the aforementioned black box, the inputs are the model parameters *s*, *r*, and *b*. In this paper, the outputs are not the state space variables *X*, *Y*, and *Z*, but rather the mean value of these quantities over a 20-time-step forward integration of the Lorenz equations. These means are denoted *X*_{mean}, *Y*_{mean}, and *Z*_{mean}.

To obtain a visual sense of the relationship between the outputs and the inputs, 50 equally spaced values are selected for each parameter, the Lorenz equations are integrated forward in time steps of 0.02, and *X*_{mean}, *Y*_{mean}, and *Z*_{mean} are computed. The empirical region (i.e., range of parameters) includes the default values, and is selected to yield a reasonably rich and complex relationship between the outputs and the inputs. That relationship (for only *Z*_{mean}) is shown in Fig. 2.

Although all three parameters are varied, the various panels in this figure show different cross sections of the relationship. The top-left panel shows *Z*_{mean}, in different shades of gray (white = low, black = high), versus *s* and *r*, with the *b* parameter set to its default value of . The top-right panel shows a different perspective of the same relationship; the *r* parameter is shown along the *x* axis, and so, the resulting scatter in the figure corresponds to different values of the *s* parameter. It can be seen that *Z*_{mean} is mostly monotonically increasing with *s* and *r*, with a slight nonlinearity in the extremes where *s* and *r* are both either low or high.

The middle panels show the dependence of *Z*_{mean} on *s* and *b*, with *r* set to its default value of 28. The behavior here is mostly monotonically decreasing with *b*; in fact, that dependence visually overwhelms the dependence on *s* noted in the top panels. The bottom panels show *Z*_{mean} versus *r* and *b*, with *s* set to its default value of 10. This time, the relationship is monotonically decreasing with *b*, but with significant variability due to *r*.

In short, a wide range of behavior can be seen over the specific range of parameters selected. Beyond this empirical region the relationships are more complex, even consisting of discontinuities. Such complexity is not an obstacle in variance-based SA methods, because the emulators often used are highly nonlinear statistical models capable of modeling a wide range of functional behavior. Here, for simplicity, the model employed to emulate the relationships shown in Fig. 2 is a cubic polynomial; there is no evidence from Fig. 2 to suggest a more complex model.

The data shown in Fig. 2 pertain to 50^{3} points in parameter space because 50 different values of each parameter are selected. Such a large sample is possible here only because the Lorentz ‘63 model is relatively simple. A large sample is chosen only to lead to graphs that are visually informative, such as those in Fig. 2. The data for performing SA are not obtained by systematically incrementing all the inputs, for that would be computationally prohibitive for most realistic models. Thus, even though the Lorenz model is sufficiently simple to allow a brute-force approach, the SA is performed on random samples (according to SRS and LHS) of size 50 taken from the experimental region spanned by the parameters *s*, *r*, and *b*.

To compare the two sampling schemes, 100 samples are taken. The resulting sampling variability of all the sensitivity measures is displayed through boxplots. It is important to point out that in a realistic application of the variance-based SA, this step of taking multiple samples is unnecessary. The only reason it is done here is to allow a comparison of the two sampling schemes.

There exists another source of variability in the Lorenz model: initial conditions. Here, for simplicity, and remaining focused on the illustration of the variance-based SA method, every time a sample is taken from the experimental region, the initial conditions are fixed at their default values, (*X*, *Y*, *Z*) = (−14, −13, 47). A similar analysis but where the initial conditions are also randomly selected has been performed. The results are not shown here, because the only difference is “larger” boxplots resulting from the additional variability due to initial conditions.

Figure 3 shows the sensitivity measures for *Z*_{mean}; results for *X*_{mean} and *Y*_{mean} are not shown here. The boxplots summarize the corresponding distributions resulting from 100 trials (i.e., 100 different samples of size 50 taken from the empirical region). The wide–white boxplots correspond to SRS, and the narrow–gray boxplots are for LHS. Comparing the two boxplots across all the panels in Fig. 3, it is clear that LHS yields no-less precise estimates (than SRS) for all of the sensitivity measures; this is evident in the fact that the gray boxplots are generally “shorter” than the white boxplots for many of the sensitivity measures.

The top-left panel shows the distribution of the *V _{i}* measures in Eq. (5) for

*s*,

*r*, and

*b*. Evidently,

*Z*

_{mean}is most sensitively dependent on the

*b*parameter, followed closely by the

*r*parameter. The parameter

*s*appears to have no effect at all. However, it would be a mistake to dismiss

*s*as an important input, because it may be important only in the presence of the other parameters. The measures

*V*

_{s}_{+r},

*V*

_{s}_{+b},

*V*

_{r}_{+b}in Eq. (6) must then be consulted; they are shown in the top-right panel. Comparing this panel with the top-left panel, it is difficult to determine if

*s*is indeed useless, because, for example

*V*

_{s}_{+r}(in top-right panel) appears to be comparable with

*V*+

_{s}*V*(in top-left panel). To better isolate the effect of

_{r}*s*, sensitivity to interaction terms

*V*

*,*

_{sr}*V*

*, and*

_{sb}*V*

*in Eq. (7) may be examined; they are shown in the left-middle panel. There appears to be evidence for a weak interaction between*

_{rb}*r*and

*b*, but it is not clear if

*s*interacts with

*r*or with

*b*.

^{3}The remaining panels show the “total” sensitivity measures in Eqs. (8) and (9) and confirm that that

*Z*

_{mean}is most sensitively dependent on

*b*and

*r*. These conclusions are the same regardless of the sampling scheme.

### The adjoint SA

It is useful to compare the above findings to the results of a more traditional method such as the adjoint method. Hall (1986) has developed a framework wherein the sensitivity of the mean of a state variable, with respect to both initial conditions and model parameters, can be obtained using the adjoint method. Application of that approach to the Lorenz'63 model (with fixed initial conditions) yields

where *X*, *Y*, and *Z* satisfy Eq. (10); *δs*, *δr*, and *δb* are perturbations to the model parameters; and *δZ*_{mean} is the corresponding change in the mean (over time *T*) of *Z*.^{4} The quantities *υ _{x}*,

*υ*, and

_{y}*υ*are components of a column vector , which must be a solution to

_{z}where *N** is the adjoint of the tangent linear model corresponding to Lorenz'63. The details leading to these results are not shown here, but similar expressions can be derived for *δX*_{mean} and *δY*_{mean}. The coefficients of *δs*, *δr*, and *δb* in Eq. (11) can be thought of as measures of sensitivity of *Z*_{mean} with respect to parameter perturbations. Note that these coefficients are functions of time *T*.

Figure 4 shows the logarithm of these sensitivity measures as a function of *T*, although only *T* = 20 is relevant for comparison with the variance-based sensitivity results. The three line thicknesses—from thin to thick—correspond to sensitivity with respect to *δs*, *δr*, and *δb*, respectively. It can be seen that the curve corresponding to *δb* is consistently above the *δr* curve, which in turn is higher than the *δs* curve. This implies that *δZ*_{mean} is most sensitive to *δb*, followed by *δr*, and then *δs*. And, of course, these results are consistent with the results of the variance-based method.

A more qualitative comparison of the two methods is provided in the next section.

## 4. Conclusions and discussion

A variance-based method for sensitivity analysis is reviewed and demonstrated on the Lorenz ‘63 model. The method relies heavily on ideas from analysis of variance, regression modeling, and experimental design (Fang et al. 2006; Santner et al. 2003). It is found that the mean of the *Z* state variable is most sensitively dependent on the *b* parameter. The sensitivity on the *r* parameter is comparable, but the *s* parameter appears to have no effect on *Z*_{mean} at all. There is also indication of a weak interaction between the *r* and *b* parameters. These findings are valid for simple random and Latin hypercube samples. It is also seen that the latter scheme leads to more precise estimates for some sensitivity measures, but not for all. The results are generally consistent with an adjoint sensitivity analysis, in so far as the two methods can be compared.

A qualitative comparison of the variance-based and the adjoint method is as follows: the adjoint method has the desirable feature that an analytic expression can be written for the sensitivity in Eq. (11). However, it has the undesirable property that the sensitivity analysis refers explicitly to a given reference trajectory [i.e., the *X*, *Y*, *Z* appearing in Eq. (11)]. As a result, *δs*, *δr*, and *δb* are all perturbations about a reference parameter set; the variance-based method does not have this limitation. Also, the variance-based method naturally allows for an estimate of the uncertainty in the sensitivity measures (e.g., through the boxplots shown in Fig. 3); the adjoint-based result in Eq. (11) does not. On the other hand, in spite of the flexibility of most emulators (in essentially estimating a response surface) the estimation may simply be wrong. This introduces another source of variability (sometimes called computational uncertainty) in the variance-based method, which does not plague the adjoint approach. The adjoint method of Hall (1986) is naturally suited to estimate *δZ*_{mean}, but it is not clear how to extend the method to allow the estimation of other quantities of interest (e.g., the change in the maximum of *Z*; i.e., *δZ*_{max}). Finally, unlike the variance-based method, the adjoint method in its current form does not allow the estimation of interaction terms; however, that generalization is possible and is currently under investigation. Also under way, are a more thorough assessment of the sensitivity with respect to initial conditions, and an application of variance-based SA to a mature numerical weather prediction model [i.e., Coupled Ocean–Atmosphere Mesoscale Prediction System (COAMPS)].^{5}

Here it has been sufficient to emulate the conditional expectations with a cubic polynomial regression. The main reason (for the sufficiency) is that the number of cases generated for “training” the regression model is practically unbounded, and so there is little or no chance of overfitting. In more realistic examples, where 1) the nonlinearity of the *η*(·) model may require more nonlinear functions, and 2) the *η*(·) model is expensive to run, alternative emulators should be used. Storlie et al. (2009) compare a wide range of emulation methods; one class of emulators which has gained recent popularity is called Gaussian processes regression (Chaloner and Verdinelli 1995; Hsieh 2009; Rougier 2008; Rougier et al. 2009; Sacks et al. 1989a,b; Welch et al. 1992). These are sophisticated models, whose training requires more effort than polynomial regression, which is the reason why they are not used here.

The analysis performed here has involved only one of the outputs (*Z*_{mean}), and the approach can be applied to each output, separately; however, others have proposed multivariate methods where any covariance structure existing across the outputs may also be taken into account (Fasso 2006; Oakley and O'Hagan 2004). This approach will be examined in the context of ongoing work with COAMPS.

Here, it has been sufficient to compare the boxplots of the sensitivity measures, representing sampling variability, only visually. However, a more rigorous treatment will be required if one desires to objectively assess the statistical significance of the results. To that end, it will be necessary to compute–construct the sampling distribution of the sensitivity measures under the null hypothesis of no effect and no interaction. That work is also currently under investigation.

## Acknowledgments

Scott Sandgathe and Greg Hakim are acknowledged for valuable discussions. Partial support for this project was provided by the Office of Naval Research Grants N00014-01-G-0460/0049 and N00014-05-1-0843.

## REFERENCES

*Modelling Mathematical Methods and Scientific Computation*. CRC Press, 497 pp.

*Neural Networks for Pattern Recognition.*Clarendon Press, 482 pp.

*Sensitivity and Uncertainty Analysis*. Chapman and Hall/CRC, 285 pp.

*Design and Analysis of Experiments*. John Wiley & Sons, 643 pp.

*Design and Modeling for Computer Experiments*. Chapman & Hall/CRC, 290 pp.

*Proceedings of the iEMSs Third Biennial Meeting: Summit on Environmental Modelling and Software,*A. Voinov, A. J. Jakeman, and A. E. Rizzoli, Eds., International Environmental Modelling and Software Society, 6 pp. [Available online at http://www.iemss.org/iemss2006/sessions/all.html.]

*Synoptic–Dynamic Meteorology and Weather Analysis and Forecasting: A Tribute to Fred Sanders, Meteor. Monogr.,*No. 55, Amer. Meteor. Soc., 147–162.

*Machine Learning Methods in the Environmental Sciences: Neural Network and Kernels*. Cambridge University Press, 349 pp.

*Ninth Conf. on Artificial Intelligence,*Seattle, WA, Amer. Meteor. Soc., 2.2. [Available online at https://ams.confex.com/ams/91Annual/webprogram/Paper187333.html.]

*Numerical Modeling of Ocean Circulation.*Cambridge University Press, 242 pp.

*Weather Prediction by Numerical Process.*2nd ed. Cambridge University Press, 250 pp.

*Technometrics,*

**51**(4), 414–424.

*Groundwater Hydrology: Conceptual and Computational Models.*John Wiley and Sons Ltd., 416 pp.

*Global Sensitivity Analysis: The Primer*. Wiley Publishing, 304 pp.

*The Design and Analysis of Computer Experiments.*Springer, 299 pp.

*Multivariate Density Estimation; Theory, Practice, and Visualization*. John Wiley & Sons, 325 pp.

*A Course in Probability*. Addison-Wesley, 789 pp.

## Footnotes

^{1}

For the reader more interested in the technical detail (as opposed to the historical detail), this section may be skipped.

^{2}

It can be shown that LHS cannot lead to less precise estimates than SRS. In fact, if the function η(·) is monotonic, then this can be proved analytically (McKay et al. 1979).

^{3}

To settle that question, one would require knowledge of the distribution of the sensitivity measures (i.e., the boxplots) under the null hypothesis of no interaction. See the discussion below.

^{4}

Technically, Hall's method gives the mean of *δZ*, but under general conditions that quantity is equal to the *δZ*_{mean}.

^{5}

COAMPS is a registered trademark of the Naval Research Laboratory.