Simulating the Influence of Type II Error on the Outcome of Past Statistical Experiments

View More View Less
  • a University of North Carolina at Asheville, Asheville, North Carolina
  • | b Bureau of Reclamation, Denver, Colorado
© Get Permissions
Full access

Abstract

Simulations of randomized winter orographic weather modification experiments were used to explore a possible cause of the many inconclusive results from previous statistical experiments. There is increasing evidence that the response to cloud treatment is highly variable due to differences in the availability of cloud liquid water, seeding agents, targeting effectiveness, and other factors. For this reason the simulations described in this paper focus on the sensitivity of previously applied statistical techniques to different responses to seeding. Data for the simulations came from two sources: the Bridger Range Experiment (BRE), conducted during the two winters from 1970 to 1972, and SNOTEL (snowpack telemetry) data from the Boise, Idaho, area during the winters of 1985–92. The principal focus is on the BRE data from which 6-h experimental units were extracted. This is because previous analyses of these data support the notion of a variable treatment response. Twenty-four-hour experimental units from the BRE and Idaho datasets were also incorporated into the simulations.

The simulations indicate a sensitivity to the size of the fraction of seeded units, which had a treatment response with the power of the test being significantly reduced as the fraction of seeded units showing a response decreased. It is suggested that past estimates of experimental duration, based on the simple model that assumed all seeded units have the same response, were overly optimistic. The results may partially explain the high frequency of inconclusive results from past statistical cloud seeding experiments. The implication of these results is described for past and future statistical weather modification experiments.

Monte Carlo techniques were applied in simulations that assumed a randomized target-control experiment. There were five models applied, which involved adding a percentage or constant responses to all or a fraction of the seeded units and capping the maximum increment. Experimental units were randomly selected from a pool of nonseeded cases. The selected units were randomly seeded or not seeded, and the seeded units were again randomly selected to have all or a fraction of them show a treatment effect, while keeping the net seasonal response approximately constant. For example, in the case of one out of three seeded units showing a treatment response, that unit would have triple the response of the simple model, which had each seeded unit showing a response. Not treatment responses were taken from the most successful partitions found for the BRE. Experimental units were added until a 0.05 one-tailed P level was achieved, where P is the probability of incorrectly concluding that there is a positive seeding effect when none exists (type I error). Each simulation was repeated 1000 times to estimate the number of experimental units needed to reach a specified power level (1 − β), where β is the probability of a type II error—the probability of not detecting a treatment response when one exists.

Abstract

Simulations of randomized winter orographic weather modification experiments were used to explore a possible cause of the many inconclusive results from previous statistical experiments. There is increasing evidence that the response to cloud treatment is highly variable due to differences in the availability of cloud liquid water, seeding agents, targeting effectiveness, and other factors. For this reason the simulations described in this paper focus on the sensitivity of previously applied statistical techniques to different responses to seeding. Data for the simulations came from two sources: the Bridger Range Experiment (BRE), conducted during the two winters from 1970 to 1972, and SNOTEL (snowpack telemetry) data from the Boise, Idaho, area during the winters of 1985–92. The principal focus is on the BRE data from which 6-h experimental units were extracted. This is because previous analyses of these data support the notion of a variable treatment response. Twenty-four-hour experimental units from the BRE and Idaho datasets were also incorporated into the simulations.

The simulations indicate a sensitivity to the size of the fraction of seeded units, which had a treatment response with the power of the test being significantly reduced as the fraction of seeded units showing a response decreased. It is suggested that past estimates of experimental duration, based on the simple model that assumed all seeded units have the same response, were overly optimistic. The results may partially explain the high frequency of inconclusive results from past statistical cloud seeding experiments. The implication of these results is described for past and future statistical weather modification experiments.

Monte Carlo techniques were applied in simulations that assumed a randomized target-control experiment. There were five models applied, which involved adding a percentage or constant responses to all or a fraction of the seeded units and capping the maximum increment. Experimental units were randomly selected from a pool of nonseeded cases. The selected units were randomly seeded or not seeded, and the seeded units were again randomly selected to have all or a fraction of them show a treatment effect, while keeping the net seasonal response approximately constant. For example, in the case of one out of three seeded units showing a treatment response, that unit would have triple the response of the simple model, which had each seeded unit showing a response. Not treatment responses were taken from the most successful partitions found for the BRE. Experimental units were added until a 0.05 one-tailed P level was achieved, where P is the probability of incorrectly concluding that there is a positive seeding effect when none exists (type I error). Each simulation was repeated 1000 times to estimate the number of experimental units needed to reach a specified power level (1 − β), where β is the probability of a type II error—the probability of not detecting a treatment response when one exists.

Save