|
<br />82'
<br />
<br />,
<br />tions about exploratory findings is that of multiplicity.
<br />The chance that, in multiple analyses of the same data,
<br />some "effect" will be found to be "one-percent signifi-
<br />cant" increases rapidly with the number of analyses per-
<br />fonned, as well as with the number of other, a priori
<br />reasonable, analyses which were not carried out be-
<br />cause the data were counter-indicative. Running a
<br />large number of analyses and treating them as a priori
<br />single tests could well be described as artificial stimula-
<br />tion of significance. A P-value of one percent for some
<br />obs~rved pattern has quite a different import from a one-
<br />percent significant result on the test of the principal
<br />hypothesis of the experiment. And yet, it has some
<br />weight, if only to lead one to reanalyze other experi-
<br />ments for repetitions of the same pattern, or to design
<br />new experiments for that purpose.
<br />As regards methodology, one must be aware that the
<br />multiplicity of explorat.ory analyses, mostly unforeseen
<br />at the design stage, does not fit into the Neyman-Pearson
<br />two-decision framework. Nor is it possible to "optimize"
<br />all these analyses simultaneously, or even to ascertain
<br />all the distributions involved. It is therefore necessary
<br />to use robust techniques which are more generally valid
<br />and give rough indications of the extent of statistical
<br />support. Nonparametric tests, especially permutation
<br />tests which do not rely on distributional assumptions or
<br />independence of observations, are suitable here, as is
<br />jackknifing.
<br />Braham comments that meteorolgists tend to evaluate
<br />findings in terms of the physical sense they seem to make,
<br />rather than merely in terms of the statistical support
<br />they have. :Meteorologists are completely right in doing
<br />so. If an exploratory analysis reveals one pattern at
<br />P = '0.10 that makes good physical sense and another
<br />pattern at P ,; 0.05 that does not fit in with anything
<br />the meteorologist knows, he will obviously, and justifi-
<br />ably, concentrate on the first pattern rather than on the
<br />second. A single experiment's P-value is only one element
<br />of all the evidence the meteorologist must muster in
<br />evaluating a finding.
<br />To return to Braham's question, the extent to which a
<br />statistician should be involved in the meteorology of a
<br />cloud seeding experiment depends on the stage of experi-
<br />mentation and analysis. The design stage must be a joint
<br />venture in which the meteorologist determines the goals
<br />of the experiment and proposes the inain variables and
<br />hypotheses to be studied. The statistician can then bring
<br />the methods of mathematical statistics to bear on efficient
<br />experimental design. The more the statistician under-
<br />stands the cloud physics involved and the characteristics
<br />of meteorological data to be studied, the better he can
<br />serve the meteorologist's purpose. If the statistician does
<br />not understand the meteorological rationale of the experi-
<br />ment, or if he is ignorant of the special features of the
<br />variables considered, his design is likely to remain a
<br />largely irrelevant txercise in mathematical statistics. An
<br />example is the derivation of optimal parametric tests
<br />under assumptions of unit-to-unit independence-even
<br />
<br />, .
<br />
<br />:t~._r~~J"r~-t~~~;.)~'o.-';,;d4:-'-,,_.~~r_~i~._,i_&_'~;;;',,;_
<br />
<br />t",,~,~_._~
<br />
<br />
<br />Journal of the American Statistical Association, March 1979
<br />
<br />though it is well-known that there is considerable serial
<br />correIa.tion in precipitation.
<br />After the design (1) comes the execution (2), followed
<br />by calculation of the preordained test and estimate (3).
<br />In theory, all this is determined at the design stage, and
<br />the experimenter may follow an unequivocal protocol
<br />through stage (2) to the conclusion of his experiment. In
<br />practice, things do not work quite that way-measure-
<br />ments often cannot be taken as planned, treatment
<br />methods may vary, some definitions are likely to be
<br />found impractical, some units must be changed, and test
<br />statistiics can almost always be improved upon. Who is
<br />to decide whether these changes are permissible within
<br />the predetermined design?
<br />To illustrate from the Israeli .rainfall stimulation ex-
<br />periment: The 8 P.M. to 8 P.M. daily unit initially used
<br />required continuous-time recording gages. When these
<br />proved difficult to obtain, was it permissible to substitute
<br />ordinary gages and thereafter continue the experiment
<br />on an 8 A.M. to 8 A.M. basis (Gabriel and Neumann
<br />1978)? When the seeding plane pilots were found to prefer
<br />flying within sight of the coa.stline rather than farther
<br />out, as originally planned, was it appropriate to shift the
<br />definition of the target accordingly? When a suitable
<br />concomitant variable was found, was it permissible to
<br />redesi~:n the significance test to take that variable into
<br />account? When local observations, as well as publica-
<br />tions from abroad, suggested that simple rank analyses
<br />would not efficiently take into account occasional large
<br />seeding; effects, was it permissible to change from the
<br />proposed Wilcoxon-Mann- Whitney test to one which
<br />now looked more powerful? When the . pilots went on
<br />strike, could the lost days be omitted from the analyses,
<br />and how was the randomization to continue after they
<br />returned to work-as allocated for that date or as al-
<br />located for the first day they were on strike?
<br />This role of the statistician is not foreseen in the texts.
<br />He has to serve as an interpreter and arbiter on the
<br />design and on changes made during experimentation. Ap-
<br />parently his understanding of the function and rationale
<br />of randomization is also required in supervising that this
<br />is actually carried out correctly. For similar reasons, the
<br />statistician may also be asked to supervise the collation
<br />of datl~ and calculation of the teEt statistic.
<br />This quality control on the statistical aspects of ex-
<br />perimentationis a function which statisticians cannot
<br />carry lOut if they are too deeply involved in an experi-
<br />ment. If they are on the experimental team, or, worse yet,
<br />employed by the meteorologists, their objectivity may be
<br />suspect. Indeed, their ability to insist on exact adherence
<br />to protocol requires not only independence but also
<br />suitablle status-it may be difficult for a fresh statistics
<br />Ph.D. to insist that an eminent cloud physicist stick to
<br />some seemingly trivial but possibly crucial point in a
<br />randomization procedure. And indeed the authority and
<br />presti~:e of the statisticians involved may well be adduced
<br />as evidence that the procedures were properly carried,
<br />
|