• Alpert, P., 2011: Meso-meteorology: Factor separation examples in atmospheric meso-scale motions. Factor Separation in the Atmosphere: Applications and Future Prospects, P. Alpert and T. Sholokhman, Eds., Cambridge University Press, 53–66.

    • Crossref
    • Export Citation
  • Alpert, P., and T. Sholokhman, Eds., 2011a: Factor Separation in the Atmosphere: Applications and Future Prospects. Cambridge University Press, 274 pp.

  • Alpert, P., and T. Sholokhman, 2011b: Some difficulties and prospects. Factor Separation in the Atmosphere: Applications and Future Prospects, P. Alpert and T. Sholokhman, Eds., Cambridge University Press, 237–244.

    • Crossref
    • Export Citation
  • Berger, A., M. Claussen, and Q. Yin, 2011: Factor separation methodology and paleoclimates. Factor Separation in the Atmosphere: Applications and Future Prospects, P. Alpert and T. Sholokhman, Eds., Cambridge University Press, 28–52.

    • Crossref
    • Export Citation
  • Collins, L. M., J. J. Dziak, and R. Z. Li, 2009: Design of experiments with multiple independent variables: A resource management perspective on complete and reduced factorial designs. Psychol. Methods, 14, 202224, https://doi.org/10.1037/a0015826.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Collins, L. M., J. J. Dziak, K. C. Kugler, and J. B. Trail, 2014: Factorial experiments: Efficient tools for evaluation of intervention components. Amer. J. Prev. Med., 47, 498504, https://doi.org/10.1016/j.amepre.2014.06.021.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Connolly, P. J., 2018: Shallow water practice model, version 1.0.0. Zenodo, https://doi.org/10.5281/zenodo.1478060.

    • Crossref
    • Export Citation
  • Fisher, R. A., 1971: The Design of Experiments. 8th ed. Hafner Publishing Company, 248 pp.

  • Hardy, M. A., 1993: Regression with dummy variables. Quantitative Applications in the Social Sciences, SAGE Publications, 90 pp.

    • Crossref
    • Export Citation
  • Krichak, S. O., and P. Alpert, 2002: A fractional approach to the factor separation method. J. Atmos. Sci., 59, 22432252, https://doi.org/10.1175/1520-0469(2002)059<2243:AFATTF>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kugler, K. C., J. B. Trail, J. J. Dziak, and L. M. Collins, 2012: Effect coding versus dummy coding in analysis of data from factorial experiments. The Pennsylvania State University Tech. Rep., http://methodology.psu.edu/media/techreports/12-120.pdf.

  • Mak, S., and C. F. J. Wu, 2019: cmenet: A new method for bi-level variable selection of conditional main effects. J. Amer. Stat. Assoc., 114, 844856, https://doi.org/10.1080/01621459.2018.1448828.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Montgomery, D. C., 2013: Design and Analysis of Experiments. John Wiley and Sons, 730 pp.

  • National Research Council, 1995: Statistical Methods for Testing and Evaluating Defense Systems: Interim Report. National Academies Press, 84 pp., https://doi.org/10.17226/9074.

    • Crossref
    • Export Citation
  • Peng, C.-Y., 2018: Discussion. Ann. Inst. Stat. Math., 70, 269274, https://doi.org/10.1007/s10463-017-0640-y.

  • Reuter, G. W., 2011: Application of the factor separation methodology to quantify the effect of waste heat, vapor and pollution on cumulus convection. Factor Separation in the Atmosphere: Applications and Future Prospects, P. Alpert and T. Sholokhman, Eds., Cambridge University Press, 163–170.

    • Crossref
    • Export Citation
  • Smith, J. A., and R. S. Penc, 2016: A design of experiments approach to evaluating parameterization schemes for numerical weather prediction: Problem definition and proposed solution approach. Conf. on Applied Statistics in Defense, Fairfax, VA, Interface Foundation of North America and George Mason University College of Science, 4183–4192.

  • Smith, J. A., R. S. Penc, and J. W. Raby, 2018: Statistical design of experiments in numerical weather prediction: Emerging results. 25th Conf. on Probability and Statistics, Austin, TX, Amer. Meteor. Soc., 6.1, https://ams.confex.com/ams/98Annual/meetingapp.cgi/Paper/326537.

  • Smith, J. A., R. S. Penc, J. W. Raby, and J. L. Cleveland, 2019: Some conclusions on applying statistical design of experiments to numerical weather prediction. 18th Conf. on Artificial and Computational Intelligence and its Applications to the Environmental Sciences, Phoenix, AZ, Amer. Meteor. Soc., TJ17.4, https://ams.confex.com/ams/2019Annual/meetingapp.cgi/Paper/352596.

  • Stein, U., and P. Alpert, 1993: Factor separation in numerical simulations. J. Atmos. Sci., 50, 21072115, https://doi.org/10.1175/1520-0469(1993)050<2107:FSINS>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Su, H., and C. F. J. Wu, 2017: CME analysis: A new method for unraveling aliased effects in two-level fractional factorial experiments. J. Qual. Technol., 49, 110, https://doi.org/10.1080/00224065.2017.11918181.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Thunis, P., and Coauthors, 2019: Source apportionment to support air quality planning: Strengths and weaknesses of existing approaches. Environ. Int., 130, 104825, https://doi.org/10.1016/j.envint.2019.05.019.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Waugh, D. W., A. M. Hogg, P. Spence, M. H. England, and T. W. N. Haine, 2019: Response of Southern Ocean ventilation to changes in midlatitude westerly winds. J. Climate, 32, 53455361, https://doi.org/10.1175/JCLI-D-19-0039.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wu, C. F. J., 2018: Rejoinder. Ann. Inst. Stat. Math., 70, 279281, https://doi.org/10.1007/s10463-017-0639-4.

  • Wu, C. F. J., and M. S. Hamada, 2009: Experiments Planning, Analysis, and Optimization. 2nd ed. Wiley Series in Probability and Statistics, John Wiley and Sons, 716 pp.

  • Yang, L., J. Smith, and D. Niyogi, 2019: Urban impacts on extreme monsoon rainfall and flooding in complex terrain. Geophys. Res. Lett., 46, 59185927, https://doi.org/10.1029/2019GL083363.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Yoshida, R., 2018: Discussion on the paper by Professor Wu. Ann. Inst. Stat. Math., 70, 275278, https://doi.org/10.1007/s10463-017-0641-x.

All Time Past Year Past 30 Days
Abstract Views 147 0 0
Full Text Views 313 183 4
PDF Downloads 206 100 6

Factor Effects in Numerical Simulations

Judah L. ClevelandArmy Research Laboratory, U.S. CCDC, White Sands Missile Range, New Mexico

Search for other papers by Judah L. Cleveland in
Current site
Google Scholar
PubMed
Close
,
Jeffrey A. SmithArmy Research Laboratory, U.S. CCDC, White Sands Missile Range, New Mexico

Search for other papers by Jeffrey A. Smith in
Current site
Google Scholar
PubMed
Close
, and
James P. CollinsArmy Research Laboratory, U.S. CCDC, Adelphi, Maryland

Search for other papers by James P. Collins in
Current site
Google Scholar
PubMed
Close
Free access

Abstract

Numerical simulations allow users to adjust factor settings in experimental runs to understand how changes in those factors affect the output. However, it is not straightforward to analyze these outputs when multiple input factors are changed, especially simultaneously. For the atmospheric sciences, Stein and Alpert introduced a method they termed “factor separation” in order to separate the “pure contribution” of a factor from “pure interactions” of combinations of factors. Although factor separation appears to be used exclusively within the atmospheric sciences, other communities achieve a similar result by computing “main effects” via design of experiments methods. While both methods yield different estimates for the factor effects or contributions, we show that factor separation effects are identical to “simple effects” in the design of experiments literature. We demonstrate how both factor separation effects and design of experiments main effects correspond to multiple linear regression coefficients with different coding methods; thus, effect estimates produced by each method are equivalent through a variable transformation. We illustrate the application of both methods using a shallow-water simulation. This connection between factor separation and the design of experiments discipline extends factor separation to more applications by making available design of experiments methods for decreasing the computational cost and calculating effects for factors with more than two settings, both of which are limitations of factor separation.

Current affiliation: Research Associateship Program, U.S. Army Research Laboratory, White Sands Missile Range, New Mexico.

Corresponding author: Jeffrey A. Smith, jeffrey.a.smith1.civ@mail.mil

Abstract

Numerical simulations allow users to adjust factor settings in experimental runs to understand how changes in those factors affect the output. However, it is not straightforward to analyze these outputs when multiple input factors are changed, especially simultaneously. For the atmospheric sciences, Stein and Alpert introduced a method they termed “factor separation” in order to separate the “pure contribution” of a factor from “pure interactions” of combinations of factors. Although factor separation appears to be used exclusively within the atmospheric sciences, other communities achieve a similar result by computing “main effects” via design of experiments methods. While both methods yield different estimates for the factor effects or contributions, we show that factor separation effects are identical to “simple effects” in the design of experiments literature. We demonstrate how both factor separation effects and design of experiments main effects correspond to multiple linear regression coefficients with different coding methods; thus, effect estimates produced by each method are equivalent through a variable transformation. We illustrate the application of both methods using a shallow-water simulation. This connection between factor separation and the design of experiments discipline extends factor separation to more applications by making available design of experiments methods for decreasing the computational cost and calculating effects for factors with more than two settings, both of which are limitations of factor separation.

Current affiliation: Research Associateship Program, U.S. Army Research Laboratory, White Sands Missile Range, New Mexico.

Corresponding author: Jeffrey A. Smith, jeffrey.a.smith1.civ@mail.mil

1. Introduction

A common theme in many experiments is a desire to both identify and quantify relationships between various factors of interest and the output. Within the atmospheric science community, factor separation (FS) (Stein and Alpert 1993) is one method of conducting such an analysis; however, in communities outside the atmospheric sciences, main and interaction effects from design of experiments (DoE) methods are widely used (e.g., Wu and Hamada 2009). Within the atmospheric science community a number of studies use FS methods; however, to the best of our knowledge DoE has seen little formal practice save only recent work (Smith and Penc 2016; Smith et al. 2018, 2019), which applied DoE methods to the study of a numerical weather prediction code.

To compute the contributions of various factors and their interactions to a predicted field, Stein and Alpert (1993) developed factor separation. In particular, they emphasize that neglecting the presence of interaction between factors can yield misleading results (Alpert and Sholokhman 2011a). Stein and Alpert (1993) demonstrated FS by using it to study the effects of surface fluxes and terrain on precipitation in a simulation. Since then, factor separation has been applied to temperature–albedo feedback of the greenhouse gases (Berger et al. 2011); the effects of topography, convection, latent, and sensible heat fluxes on Alpine lee cyclogenesis (Alpert 2011); the effects of waste heat, vapor, and pollution on cumulus convection (Reuter 2011); and many other situations (see Alpert and Sholokhman 2011a). Factor separation is such a simple and practical method for isolating the contributions and interactions of various factors that members of the atmospheric community quickly saw its usefulness and applied it to many different atmospheric simulations. More than 25 years later, the original FS paper continues to receive citations (e.g., Thunis et al. 2019; Waugh et al. 2019; Yang et al. 2019); however, FS does not appear to be used outside the atmospheric science community.

In other disciplines, an alternative method for attributing output response to different factors comes from the statistical DoE. DoE first appeared in 1935 through the work of Fisher who placed a particular emphasis on the construction of an experiment so to optimize the ability to make inferences about the results (Fisher 1971). While Fisher began by applying his methods to experiments in the agricultural sciences, others quickly applied Fisher’s methods to fields as diverse as the chemical and nuclear industries, manufacturing, and the social sciences (National Research Council 1995; Kugler et al. 2012).

The DoE method for quantifying the effects of factors is via the use of main and interaction effects (Wu and Hamada 2009; Montgomery 2013). While the objective of computing these quantities is similar to that of FS, the resulting effect estimates are different. This led us to ask why these methods yield different estimates and whether one method should be preferred over the other. Alpert and Sholokhman (2011b) briefly explored the differences between FS and DoE main effects; however, they presented the two methods as coming from completely different mathematical foundations. We found, however, that DoE “simple effects,” as opposed to “main effects,” are calculated in the same manner as FS effects (Collins et al. 2014). From this we conclude that, at the level of Stein and Alpert’s original paper, factor separation is a special case of design of experiments.

In this paper, we first define many of the terms we use, and then we introduce full factorial designs since both methods employ this experiment paradigm. Next, we give the mathematical foundation of factor separation followed by a discussion of DoE main effects. With both methods outlined, we show the results of our research: namely, that FS effects are actually equivalent to DoE simple effects. Furthermore, through a variable transformation, we show that FS effects are essentially equivalent to DoE main effects for analytic purposes. We show how the differences between simple and main effects do not alter conclusions regarding the models created using either method. Next, we demonstrate how to interpret the results of each method with a simple example using a shallow-water equations (SWE) model. Finally, we address the assertions of Alpert and Sholokhman (2011b) with regard to the benefits of FS effects over main effects, and then propose how DoE can reduce the limitations of FS when computational cost and factors with more than two levels are concerns.

2. Definitions

Here we define some common DoE and FS terms.

a. Basic definitions

  • Factor—A variable in an experiment.

  • Level—A particular value or setting of a factor.

  • Run—An experiment with the factors each set to a specific level.

  • Replication—Multiple runs with the factor levels unchanged. Note: in a well designed and constructed deterministic computer simulation, replication is trivial because each response is identical.

b. Design space

  • Experimental design—The planned set of runs in an experiment.

  • Full factorial design—An experiment where every combination of factor levels is present.

  • Balanced design—An experiment where every factor level occurs in the same number of runs.

  • Orthogonal design—An experiment where all factor level combinations occur in the same number of runs.

c. Effects estimates

  • Main effect—The difference between the average of all runs at the high level of a factor and the low level of that factor.

  • Conditional main effect—The main effect of a factor at a given level of another factor.

  • Simple effect—The conditional main effect of a factor given all other factors are held at their low levels.

  • Interaction effect—The joint effect of two or more factors.

  • Pure contribution—The result of factor separation that is the fraction of the response induced by a particular factor.

  • Pure interaction (or synergy)—The result of factor separation that is the fraction of the response induced by a combination of factors.

  • Base case (zero state)—The run with all factors off (or low).

  • Control run—The run with all factors on (or high).

d. Coding

  • Coding—Assigning numbers to categorical variables in order to incorporate them into a statistical model.

  • Dummy coding—The method of coding that uses 0 for off (or low) and 1 for on (or high).

  • Effects coding—The method of coding that uses −1 for off (or low) and 1 for on (or high).

3. Full factorial designs

Both classical DoE and FS use a full factorial design, though Stein and Alpert (1993) do not use the term in their original paper. A full factorial design has a run for every combination of factors and levels. When all k factors have two levels, typically present and absent or high and low, the design is termed a 2k factorial because there are 2k combinations of factors. A full 2k factorial design is balanced and orthogonal (Wu and Hamada 2009; Montgomery 2013). Table 1 illustrates the full 2k factorial design for k = 3 factors.

Table 1.

The 2k factorial design for k = 3 factors at two levels. Minus indicates the factor is absent, and plus indicates it is present.

Table 1.

In the discussion that follows, we focus exclusively on the case when all factors have two levels. Mixed designs, where some study factors have more than two levels, are discussed by Wu and Hamada (2009) and Montgomery (2013); however, they are not a subject of this paper.

4. Stein and Alpert factor separation

Stein and Alpert (1993) introduced factor separation, which calculates pure contributions and pure interactions. They set up the mathematical basis as follows:

The field f depends on n factors ψi (i = 1, 2, …, n), where

ψi(ci)=ciψi,0ci1.

When a factor ψi is absent ci = 0, and when present ci = 1. The field f is a continuous function of ci (i = 1, 2, …, n):

f=f(c1,c2,,cn).

They then decompose f through a Taylor series expansion:

f(c1,c2,,cn)=f^0+i=1nf^i(ci)+i,j=1,2n1,nf^ij(ci,cj)++f^123n(c1,c2,,cn),

where each f^ijk(ci,cj,ck,) is identically zero if any of the variables ci (i = 1, 2, …, n) are zero. Additionally, we can drop the ones by setting f^ij=f^ij(1,1,).

The quantity f^0 is known as the base case and is the value when none of the chosen factors are present. Thus, it is the portion contributed by other influences independent of the chosen factors. The quantity f^i is the pure contribution of factor i, f^k the pure contribution of factor k, and f^ik the pure contribution due to the pure interaction, or synergy, of factors i and k, and similarly for more factors.

For example, for the field with three factors i, j, and k where i and k are present but j is absent, the result consists of the base case, the pure contribution of i, the pure contribution of k, and the synergy of i and k:

f(ci,cj,ck)=f(1,0,1)=fik=f^0+f^i(1)+f^j(0)+f^k(1)+f^ij(1,0)+f^ik(1,1)+f^jk(0,1)+f^ijk(1,0,1)=f^0+f^i+f^k+f^ik.

A full factorial for n factors yields the following results:

f0f(0,0,,0)=f^0,
fi=f^0+f^i,
fij=f^0+f^i+f^j+f^ij,
fijk=f^0+f^i+f^j+f^k+f^ij+f^jk+f^ik+f^ijk,
f123n=f^0+i=1nf^i+i=1,2n1,nf^ij+i,j,k=1,2,3n2,n1,nf^ijk++f^123n.

Notice there are 2n equations with 2n unknowns so each f^ij can be solved for explicitly using recursive elimination.

For example, using the runs from the 23 factorial design found in Table 1, the pure contributions and interactions for the three factors A, B, and C are computed by

f^0=f0,
f^A=fAf0,
f^B=fBf0,
f^C=fCf0,
f^AB=fABfAfB+f0,
f^AC=fACfAfC+f0,
f^BC=fBCfBfC+f0,
f^ABC=fABCfABfACfBC+fA+fB+fCf0.

In the base case, f^0, the contribution from factors is not considered; f^A is the pure contribution of factor A while f^B and f^C are the pure contributions of factors B and C, respectively. Term f^AB is the contribution from pure interaction of factors A and B, and similarly for the other interaction terms.

5. Design of experiments: Main and interaction effects

Computing the effects of various factors is a commonly used method from the design of experiments toolkit. The effects are typically calculated from the results of a full factorial experiment. The main effect of a factor is the change in the response induced by a change in the factor’s level (Wu and Hamada 2009; Montgomery 2013). A main effect is the difference between the average of all observations at the high level of the factor and the average at the low level. Using the notation of Wu and Hamada (2009), the main effect of a factor A is

ME(A)=z¯(A+)z¯(A),

where z¯(A+) is the average of all runs when A is present (high) and z¯(A) is the average of all runs when A is absent (low).

Using the three-factor experiment from Table 1, the main effect of factor A is found by

ME(A)=14(fA+fAB+fAC+fABC)14(f0+fB+fC+fBC)=14(fA+fAB+fAC+fABCf0fBfCfBC).

The other main effects are similarly calculated.

When the response for a factor changes as the levels of or more other factors change, we say there is an interaction effect. Once again using Wu and Hamada’s notation, the interaction effect of factors A and B is

INT(A,B)=12[ME(A|B+)ME(A|B)],

where ME(A|B+) is the conditional main effect of A when B is present and likewise ME(A|B−) when B is absent. These are calculated by

ME(A|B+)=z¯(A+|B+)z¯(A|B+),
ME(A|B)=z¯(A+|B)z¯(A|B),

where z¯(A+|B+) is the average of the runs with both A and B present and similarly for the other terms. The interaction of AB compares the effect of A when B is on and when B is off. If there is a significant difference in how A affects the output depending on B, the interaction effect will be large.

Using our example from Table 1, the interaction effect of A and B is

INT(A,B)=12{[12(fAB+fABC)12(fB+fBC)][12(fA+fAC)12(fC+f0)]}=14(fAB+fABCfBfBCfAfAC+fC+f0)

and similarly for the interactions of A and C and of B and C.

The process is similar for interaction effects involving three or more factors.

6. Coding, simple effects, and factor separation

To understand the relationship between coding, effects, and factor separation, consider the general linear model used in multiple linear regression. The response y is a function of factors x1, x2, …, xm such that

y=β0+β1x1+β2x2++βmxm+ε,

where ε is random noise assumed to be normally distributed with mean 0 and variation σ2 (Wu and Hamada 2009).

We can take the expected value of Eq. (24) to find that

E(y)=β0+β1x1+β2x2++βmxm.

We account for interaction among the xi in Eq. (24) by using βij with xixj. For example, in the two-factor case:

y=β0+β1x1+β2x2+β12x1x2+ε.

When the xi correspond to categorical instead of numerical variables, xi are typically coded. One option employs effects coding by using −1 to denote when a factor is absent and 1 for present while another employs dummy coding wherein 0 denotes a factor is absent and 1 its presence (Hardy 1993). We also note that the coefficients of dummy coded linear regression are known as simple effects as opposed to the main effects because they only measure the effect when all other factors are off due to the 0 coding (Collins et al. 2009, 2014).

Since regression coefficients in Eq. (24) measure a one-unit change and the main effects measure a two-unit change (−1 to 1) when using effects coding, the regression coefficients are one-half the main and interaction effects; thus, the coefficients quantify how the factor or combination of factors affects the average of all the runs, which is also known as the “grand mean” (Kugler et al. 2012; Wu and Hamada 2009; Montgomery 2013).

The interpretation of model in Eq. (24) is not as clear when using dummy coding, which we can see by constructing the equations for a factorial design. Let yi be the run with just factor xi present and yijk the run with factors xi, xj, and xk present, and so on.

Then with dummy coding,

E(y0)=β0,
E(yi)=β0+βi,
E(yij)=β0+βi+βj+βij,
E(y123n)=β0+i=1nβi+i,j=1,2n1,nβij+i,j,k=1,2,3n2,n1,nβijk++β123n.

A full factorial design for n factors with two levels and no replication would yield 2n equations. The run fijk is used to estimate E(yijk). With the 2n runs from the full factorial, all 2n β can be computed using

E(y123n)=fijkn.

Notice this setup is identical to FS with different notation. Every f^ in Eq. (9) corresponds to a β in Eq. (30):

f^ij=βij.

Consequently, FS effects are equivalent to multiple linear regression coefficients for a full factorial with one replication when using dummy coded variables. Dummy coding, however, can be used with any number of replications. For a full factorial design, the mean of the replications of each run is used to find the coefficients.

FS was only designed for a set of runs with a single replication, but simple effects can be calculated with or without replication. Several equivalent ways to express the simple effect of A are

SE(A)=ME(A|otherfactors)=z¯(A|otherfactors)z¯(A|otherfactors)=fA¯f0¯,

where fA¯ and f0¯ are the averages of the replications of fA and f0, respectively, if the runs had been replicated. We see then that FS is the special case of calculating the simple effects when there is no replication.

7. Simple versus main effect models

DoE simple effects calculate the difference in the response between when A is present and when A is absent, all other factors absent, or the interaction effect of A and another factor, all the other factors absent (Collins et al. 2009). On the other hand, DoE main effects calculate the difference between when A is present and when A is absent, averaging over all levels of the other factors. The main effect answers questions about the average effect of A or the average interaction effect of A and another factor combined (Collins et al. 2009).

As demonstrated in the previous section, simple effects are identical to the coefficients computed from coding multiple linear regression with 0 and 1 dummy coding (Kugler et al. 2012). Main and interaction effects are twice the coefficients computed when coding multiple linear regression using −1 and 1 effects coding (Kugler et al. 2012). Even though the coefficients from these two coding methods are different, the predictions are identical because the coding simply scales the variables (Hardy 1993).

To demonstrate this, let us use the effects from the two methods to create models to predict the output for different configurations. We can do this by creating the linear regression equations for each coding scheme. For simplification, we will show this for the two factor situation. We will assume replication is present because without replication both models simply reproduce the data. The FS would be performed on the mean of the replications of each run. Table 2 gives the averaged runs for the experiment, and Table 3 shows the computations for the effect estimates.

Table 2.

The 22 factorial design with replication.

Table 2.
Table 3.

The 22 factorial effects.

Table 3.

The simple effects model is created from the factor separation effects:

FS(x1,x2)=f^0+f^1x1+f^2x2+f^12x1x2,0x1,x21.

We construct the main effects model using wi instead of xi because of the difference in coding. z¯ is the average of all the runs, known as the “grand mean.” The main effects model for two variables is

DOE(w1,w2)=z¯+ME(A)2w1+ME(B)2w2+INT(A,B)2w1w2,1w1,w21.

Notice that we can convert between xi and wi:

w1=2x11,
w2=2x21.

By substituting w1 and w2 into the DoE model we find

DOE(x1,x2)=z¯+ME(A)2(2x11)+ME(B)2(2x21)+INT(A,B)2(2x11)(2x21).

Substituting the equations for ME(A), ME(B), INT(A, B), and z¯ into Eq. (38) and simplifying yields

DOE(x1,x2)=f0¯f0¯x1f0¯x2+fA¯x1+fB¯x2+f0¯x1x2fA¯x1x2fB¯x1x2+f¯ABx1x2.

Similarly, substituting the equations for f^0, f^1, f^2, and f^12 into the factor separation model in Eq. (34) gives us

FS(x1,x2)=f0¯f0¯x1f0¯x2+fA¯x1+fB¯x2+f0¯x1x2fA¯x1x2fB¯x1x2+f¯ABx1x2.

Notice that

DOE(x1,x2)=FS(x1,x2).

Thus, both models will predict the same output for any given input. While this proof is only for two variables, it is known that effects coding and dummy coding always create identical predictions for any number of variables. The only difference is in interpreting the coefficients (Kugler et al. 2012). In fact Hardy (1993) puts it well: “The different coding scheme affects the way the information is captured—the manner in which group differences are arrayed—but it does not affect the overall picture because the underlying structure remains unchanged from earlier estimations; we simply view it from a different angle.”

8. Shallow-water equations example

We demonstrate FS and DoE main effects using a shallow-water equations model coded in Python by Paul Connolly that we modified for our use. The equations were solved using the Lax–Wendroff method (Connolly 2018). This shallow-water model allows for varying different parameters. We investigate how the different parameters affect the simulation output. In particular, for this experiment our goal was to understand how the simulated surface height changed with varying wind speed and latitude.

The height field was initialized so the initial wind was uniform westerly in the northern half of the domain and uniform easterly in the southern half of the domain, creating a sharp shear. Since in Stein and Alpert’s (1993) example they varied two parameters, surface fluxes and terrain, we likewise varied two parameters: latitude (factor A) and wind speed (factor B). We placed the domain at a latitude of 20° as the low setting and moved it to a latitude of 45° as the high. We used 20 m s−1 for the low wind speed and 50 m s−1 for the high wind speed. The surface height at 30 h into the simulation, before the point when the sharp shear starts to become unstable, was used as the response variable.

Following a full factorial design, the four simulation runs are found in Table 4. The results are in Figs. 14.

Table 4.

The four shallow-water model simulations.

Table 4.
Fig. 1.
Fig. 1.

SWE simulation result run 1: All factors low.

Citation: Journal of the Atmospheric Sciences 77, 7; 10.1175/JAS-D-19-0263.1

Fig. 2.
Fig. 2.

SWE simulation result run 2: Wind speed low, latitude high.

Citation: Journal of the Atmospheric Sciences 77, 7; 10.1175/JAS-D-19-0263.1

Fig. 3.
Fig. 3.

SWE simulation result run 3: Wind speed high, latitude low.

Citation: Journal of the Atmospheric Sciences 77, 7; 10.1175/JAS-D-19-0263.1

Fig. 4.
Fig. 4.

SWE simulation result run 4: Wind speed high, latitude high.

Citation: Journal of the Atmospheric Sciences 77, 7; 10.1175/JAS-D-19-0263.1

We then calculated the DoE main effects (Figs. 58) and FS effects (Figs. 912) at every point in the domain and compared the results. Note that the scales differ between the grand mean or base effect figures and the remaining effect figures.

Fig. 5.
Fig. 5.

Design of experiments: Grand mean of all SWE runs.

Citation: Journal of the Atmospheric Sciences 77, 7; 10.1175/JAS-D-19-0263.1

Fig. 6.
Fig. 6.

Design of experiments: Main effect of higher latitude.

Citation: Journal of the Atmospheric Sciences 77, 7; 10.1175/JAS-D-19-0263.1

Fig. 7.
Fig. 7.

Design of experiments: Main effect of higher wind speed.

Citation: Journal of the Atmospheric Sciences 77, 7; 10.1175/JAS-D-19-0263.1

Fig. 8.
Fig. 8.

Design of experiments: Interaction effect of higher wind speed and latitude.

Citation: Journal of the Atmospheric Sciences 77, 7; 10.1175/JAS-D-19-0263.1

Fig. 9.
Fig. 9.

Factor separation: Zero-state low latitude, low wind speed.

Citation: Journal of the Atmospheric Sciences 77, 7; 10.1175/JAS-D-19-0263.1

Fig. 10.
Fig. 10.

Factor separation: Contribution (simple effect) of higher latitude.

Citation: Journal of the Atmospheric Sciences 77, 7; 10.1175/JAS-D-19-0263.1

Fig. 11.
Fig. 11.

Factor separation: Contribution (simple effect) of higher wind speed.

Citation: Journal of the Atmospheric Sciences 77, 7; 10.1175/JAS-D-19-0263.1

Fig. 12.
Fig. 12.

Factor separation: Synergy of higher wind speed and latitude.

Citation: Journal of the Atmospheric Sciences 77, 7; 10.1175/JAS-D-19-0263.1

Comparing DoE main effects and FS effects, it is clear that the results are similar but not the same. Figures 5 and 9 are the constants for the regression equations. DoE main effects use the grand mean in Fig. 5 as the constant and compare changes in factor levels to this mean while FS uses the base case, no factor run, in Fig. 9 as the constant and compares changes to this base.

Figures 6 and 10 both show the effect of changing the latitude to 45°. While they have the same general shape, the main effect of latitude in Fig. 6 has a steeper slope than the pure contribution of latitude in Fig. 10. This is because the main effect calculates the effect of latitude at both low and high wind speeds while FS calculates the effect of latitude just at low wind speeds. High wind speeds at high latitudes increase the slope, which is shown in the main effect of latitude. Notice the FS interaction in Fig. 12 has a steeper slope than the DoE main effects interaction in Fig. 8 so the information that higher latitude and higher wind speeds increase the slope is contained in the FS interaction in Fig. 12.

Thus, both methods tell us that in the SWE simulations the factors each individually increase the slope as well as interact together to increase the slope as the factors increase.

9. Advantages of DoE for factor separation

Alpert and Sholokhman (2011b) compared FS effects to main effects, which they called “factorial modeling,” and concluded FS works better for atmospheric simulations. They did identify a few areas of difficulty with FS, especially that of the exponentially increasing computational cost. Here we address their assertions of the benefits of FS over main effects and demonstrate how DoE uses fractional factorial designs with smaller numbers of runs to decrease the computational cost. We briefly discuss the benefits of DoE for factors with more than two levels.

Alpert and Sholokhman (2011b) presented two main advantages of FS effects over DoE effects.

Stein and Alpert pointed out in their original paper, and then reiterated it in the book that followed, that all 2n FS pure contributions and synergies add up to the “full run” (all factors on), which they also called the control run (Stein and Alpert 1993; Alpert and Sholokhman 2011a). They assert this allows one to view the effects as a percentage of the full run. For example, using simple numbers in a two-factor experiment, if the pure contribution of a factor A is f^A=2 and the full run result is fAB = 8, then the pure contribution of A is 25% of the full run. This can be helpful, but it only applies to the control run. It would be a mistake to then say A contributes 25% of the result of any run including A. Thus, the usefulness of the percentage-wise analysis is limited.

They also discussed the importance of the zero-state run with all factors off and pointed out, “It is shown here and in many earlier FS studies that in the atmosphere the zero-state contribution can be significant and should be calculated” (Alpert and Sholokhman 2011b). Because the main and interaction effects do not calculate this zero state, Alpert and Sholokhman assert FS is a better method. It is true the zero state is not calculated as a main effect, but it is always one of the runs in a full factorial model so that information still exists when using DoE.

The biggest difference between FS and DoE is FS effects are compared to the zero state while main effects are compared to the grand mean. It is easy to see that, depending on the experiment, sometimes it is more helpful to compare the effects to the grand mean and other times to compare them to the zero state. It depends on the purpose of the particular experiment.

When a full factorial design can be used, both types of effects are easily computed. In fact, using both methods allows an even better glimpse into how the factors contribute to the results. As Alpert and Sholokhman (2011b) mention, however, a full factorial design can have a high computational cost as the number of factors grows while leaving a factor out of the analysis can have unintended consequences since synergistic contributions of the factors will then be wrapped into the other contributions (Alpert and Sholokhman 2011b). One solution is to not use a full factorial design. Because oftentimes the higher-order synergies become small, they may be neglected. A full factorial is not necessary in order to calculate the first- and second-order contributions alone (Alpert and Sholokhman 2011b).

This concept that the higher-order synergies may not be important is known in the design of experiments as the effect hierarchy principle. This principle is stated in Wu and Hamada (2009) as follows:

  • “Lower order effects are more likely to be important than higher order effects.”

  • “Effects of the same order are equally likely to be important.”

This principle does not always hold, but it is useful for experiments with few runs but a large number of factors (Wu and Hamada 2009).

The design of experiments deals extensively with the problem of not being able to run a full factorial experiment. This is addressed in the next section.

a. Fractional factorial designs and aliasing

Design of experiments offers fractional factorial designs, also known as 2kp designs, that use 2kp runs instead of 2k runs for k factors (Wu and Hamada 2009). These can significantly decrease the computational cost of the design, but the consequence of this is a concept known as aliasing. Aliasing of effects occurs when the data cannot distinguish the estimate of one effect from the estimate of another effect (Wu and Hamada 2009). An example of a fractional factorial design and the resulting aliased effects can be found in the appendix.

Note fractional factorial designs should not be confused with the concept of fractional treatment as proposed by Krichak and Alpert (2002). Fractional treatments deal with varying the intensities of the factors, which is not what is being discussed here.

The trade-off for fewer runs is aliasing, but there are ways to still gather information from aliased effects. Because of the effect hierarchy principle, the lower-order effects are more likely to be significant than the higher-order ones, so the higher-order interaction of an alias in many cases may be ignored (Wu and Hamada 2009). Previous knowledge of effect importance can also help in sorting through the aliased effects.

“Dealiasing” effects is a current area of study in the design of experiments where inferences about the effects can still be made from aliased effects. While Wu (2018) admits, “When the experiment does not contain enough information, one cannot squeeze too much out of it,” strides are being made in the ability to dealias effects. It is out of the scope of this paper to go into detail of the current techniques for dealiasing, but one relevant emerging technique is that of using the conditional main effect (CME) when the main effect is aliased (Su and Wu 2017; Mak and Wu 2019; Peng 2018). CMEs are actually similar, and in some cases identical, to FS effects. They are identical when the CME is conditioned on all other factors being zero. The CME method trades out aliased effects with CMEs. This combination of using both traditional effects and conditional main effects seems promising (e.g., Yoshida 2018).

b. Simple effects of fractional designs

It was considered whether or not FS effects, or simple effects, can be computed from the results of a fractional experiment. While we do not have a complete answer at this time, we know it is not always possible to perform FS as described by Stein and Alpert if there are missing runs. An advantage of using main effects is it is always possible to calculate them. For example, if the run with just factor A is not completed, then FS cannot be performed to find the pure contribution of A. The main effect of A, however, can still be computed using the other runs, though it may be aliased with another effect.

c. Factors with more than two levels

One last advantage of using DoE to approach such experiments is there are methods for handling factors with more than two levels (Wu and Hamada 2009). Krichak and Alpert (2002) only briefly address the idea of exploring values in between the high and low levels. Since few factors are as simple as on and off, the ability to incorporate more levels is helpful.

10. Conclusions

The ability to tease out factor effects and interactions from a simulation is invaluable for many disciplines, and the benefits of factor separation for the atmospheric sciences are numerous. We showed here that FS effects are the same as simple effects, a tool used in the design of experiments. FS and DoE main effects both correspond to multiple linear regression coefficients with different coding schemes, and when used to create models, the two methods produce identical output (Kugler et al. 2012). The interpretation of the simple effects, or FS effects, is different from that of the main effects, but in general, as demonstrated with the shallow-water equations, we can come to the same conclusions using either method (Hardy 1993).

In the case of a small number of factors with two levels, where a full factorial design is possible, neither method is inherently better, but one way may be better suited to the specific situation and questions being asked. It is critical, however, that whichever method is used is made clear since the interpretations are different.

We recommend continued investigation into how the design of experiments can benefit the atmospheric sciences beyond the FS method since the DoE approach includes fractional factorial designs as well as the ability to account for more than two levels, allowing for greater applicability to atmospheric simulations.

Acknowledgments

The authors wish to acknowledge Dr. Ligia Bernardet of NOAA/GSD, the lead for DTC’s model test bed, who asked one of us (Smith) how DoE related to factor separation at the 2018 Annual AMS meeting in response to our paper presented therein. Her question sparked this research. Research was sponsored by the U.S. CCDC Army Research Laboratory and was accomplished under Cooperative Agreement W911NF-18-2-0252. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. CCDC Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.

APPENDIX

A Fractional Factorial Design Example

As an example of a fractional design, Table A1 is a one-half fraction design for four factors, A, B, C, and D. A full factorial for four factors would require 24 = 16 runs, but this design has 24−1 = 8 runs.

Table A1.

Resolution IV 24−1 fractional design.

Table A1.

In the design matrix, there are four columns for the main effects, six columns for the two-factor interactions, four columns for the three-factor interactions, and one column for the four-factor interaction. These are generated by multiplying the corresponding A, B, C, and D columns. For example, column AB is generated by multiplying pairwise the entries for A and B in each row. This is a simple way to see which effects are aliased. If the columns for two effects are identical, the effects are aliased. We denote aliased effects by A = BCD to show column A and column BCD are identical and thus ME(A) and INT(BCD) are aliased. The aliased effects for the design in Table A1 are as follows:

A=BCD,
B=ACD,
C=ABD,
D=ABC,
AB=CD,
AC=BD,
AD=BC,
I=ABCD,

where I = ABCD, the column with all ones, is known as the “defining relation.” This type of design is a Resolution IV design because of the four letters in the defining relation (Wu and Hamada 2009).

The main effects are each aliased with a three-way interaction effect. Using the effect hierarchy principle, if we assume the three-way interactions are zero, we can estimate the main effects. The two-way effects, however, cannot be estimated because they are all aliased with another two-way interaction.

REFERENCES

  • Alpert, P., 2011: Meso-meteorology: Factor separation examples in atmospheric meso-scale motions. Factor Separation in the Atmosphere: Applications and Future Prospects, P. Alpert and T. Sholokhman, Eds., Cambridge University Press, 53–66.

    • Crossref
    • Export Citation
  • Alpert, P., and T. Sholokhman, Eds., 2011a: Factor Separation in the Atmosphere: Applications and Future Prospects. Cambridge University Press, 274 pp.

  • Alpert, P., and T. Sholokhman, 2011b: Some difficulties and prospects. Factor Separation in the Atmosphere: Applications and Future Prospects, P. Alpert and T. Sholokhman, Eds., Cambridge University Press, 237–244.

    • Crossref
    • Export Citation
  • Berger, A., M. Claussen, and Q. Yin, 2011: Factor separation methodology and paleoclimates. Factor Separation in the Atmosphere: Applications and Future Prospects, P. Alpert and T. Sholokhman, Eds., Cambridge University Press, 28–52.

    • Crossref
    • Export Citation
  • Collins, L. M., J. J. Dziak, and R. Z. Li, 2009: Design of experiments with multiple independent variables: A resource management perspective on complete and reduced factorial designs. Psychol. Methods, 14, 202224, https://doi.org/10.1037/a0015826.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Collins, L. M., J. J. Dziak, K. C. Kugler, and J. B. Trail, 2014: Factorial experiments: Efficient tools for evaluation of intervention components. Amer. J. Prev. Med., 47, 498504, https://doi.org/10.1016/j.amepre.2014.06.021.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Connolly, P. J., 2018: Shallow water practice model, version 1.0.0. Zenodo, https://doi.org/10.5281/zenodo.1478060.

    • Crossref
    • Export Citation
  • Fisher, R. A., 1971: The Design of Experiments. 8th ed. Hafner Publishing Company, 248 pp.

  • Hardy, M. A., 1993: Regression with dummy variables. Quantitative Applications in the Social Sciences, SAGE Publications, 90 pp.

    • Crossref
    • Export Citation
  • Krichak, S. O., and P. Alpert, 2002: A fractional approach to the factor separation method. J. Atmos. Sci., 59, 22432252, https://doi.org/10.1175/1520-0469(2002)059<2243:AFATTF>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kugler, K. C., J. B. Trail, J. J. Dziak, and L. M. Collins, 2012: Effect coding versus dummy coding in analysis of data from factorial experiments. The Pennsylvania State University Tech. Rep., http://methodology.psu.edu/media/techreports/12-120.pdf.

  • Mak, S., and C. F. J. Wu, 2019: cmenet: A new method for bi-level variable selection of conditional main effects. J. Amer. Stat. Assoc., 114, 844856, https://doi.org/10.1080/01621459.2018.1448828.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Montgomery, D. C., 2013: Design and Analysis of Experiments. John Wiley and Sons, 730 pp.

  • National Research Council, 1995: Statistical Methods for Testing and Evaluating Defense Systems: Interim Report. National Academies Press, 84 pp., https://doi.org/10.17226/9074.

    • Crossref
    • Export Citation
  • Peng, C.-Y., 2018: Discussion. Ann. Inst. Stat. Math., 70, 269274, https://doi.org/10.1007/s10463-017-0640-y.

  • Reuter, G. W., 2011: Application of the factor separation methodology to quantify the effect of waste heat, vapor and pollution on cumulus convection. Factor Separation in the Atmosphere: Applications and Future Prospects, P. Alpert and T. Sholokhman, Eds., Cambridge University Press, 163–170.

    • Crossref
    • Export Citation
  • Smith, J. A., and R. S. Penc, 2016: A design of experiments approach to evaluating parameterization schemes for numerical weather prediction: Problem definition and proposed solution approach. Conf. on Applied Statistics in Defense, Fairfax, VA, Interface Foundation of North America and George Mason University College of Science, 4183–4192.

  • Smith, J. A., R. S. Penc, and J. W. Raby, 2018: Statistical design of experiments in numerical weather prediction: Emerging results. 25th Conf. on Probability and Statistics, Austin, TX, Amer. Meteor. Soc., 6.1, https://ams.confex.com/ams/98Annual/meetingapp.cgi/Paper/326537.

  • Smith, J. A., R. S. Penc, J. W. Raby, and J. L. Cleveland, 2019: Some conclusions on applying statistical design of experiments to numerical weather prediction. 18th Conf. on Artificial and Computational Intelligence and its Applications to the Environmental Sciences, Phoenix, AZ, Amer. Meteor. Soc., TJ17.4, https://ams.confex.com/ams/2019Annual/meetingapp.cgi/Paper/352596.

  • Stein, U., and P. Alpert, 1993: Factor separation in numerical simulations. J. Atmos. Sci., 50, 21072115, https://doi.org/10.1175/1520-0469(1993)050<2107:FSINS>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Su, H., and C. F. J. Wu, 2017: CME analysis: A new method for unraveling aliased effects in two-level fractional factorial experiments. J. Qual. Technol., 49, 110, https://doi.org/10.1080/00224065.2017.11918181.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Thunis, P., and Coauthors, 2019: Source apportionment to support air quality planning: Strengths and weaknesses of existing approaches. Environ. Int., 130, 104825, https://doi.org/10.1016/j.envint.2019.05.019.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Waugh, D. W., A. M. Hogg, P. Spence, M. H. England, and T. W. N. Haine, 2019: Response of Southern Ocean ventilation to changes in midlatitude westerly winds. J. Climate, 32, 53455361, https://doi.org/10.1175/JCLI-D-19-0039.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wu, C. F. J., 2018: Rejoinder. Ann. Inst. Stat. Math., 70, 279281, https://doi.org/10.1007/s10463-017-0639-4.

  • Wu, C. F. J., and M. S. Hamada, 2009: Experiments Planning, Analysis, and Optimization. 2nd ed. Wiley Series in Probability and Statistics, John Wiley and Sons, 716 pp.

  • Yang, L., J. Smith, and D. Niyogi, 2019: Urban impacts on extreme monsoon rainfall and flooding in complex terrain. Geophys. Res. Lett., 46, 59185927, https://doi.org/10.1029/2019GL083363.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Yoshida, R., 2018: Discussion on the paper by Professor Wu. Ann. Inst. Stat. Math., 70, 275278, https://doi.org/10.1007/s10463-017-0641-x.

Save