i
<br />~ ...
<br />~ ..
<br />.w
<br />~~,
<br />er examples, which-are what Abelson (199 ~) re-
<br />ferred to as "gratuitous" s `gnific:uice testing-
<br />testing .what is already known.
<br />Three comments in favor of the point null
<br />hypothesis, such as µ = p.,r. First, while such
<br />hypotheses are virtually alws~~s false for sam-
<br />pling studies, they may be reasonable for ex-
<br />perimental studies in which subjects are ran-
<br />domly assigned to treatment groups (Niulaik et
<br />al. 1997). Second, testing a point null hypothesis
<br />in fact does provide a reasonable approximation
<br />to a more appropriate question: is µ neazly
<br />equal to µ~ (Berger and Delampady 1987, Ber-
<br />ger and Selll:e 1987), if the sample size is mod-
<br />est (Rindskopf 1997). Large sample .sizes will
<br />result in small P-values even if µ is nearly equal
<br />to µ,,. Third, testing the point null hypothesis is
<br />mathematically much easier than testing com-
<br />posite null hypotheses, which involve noncen-
<br />trality parameters (Steiger and Fouladi 1997).
<br />The bottom line on P-values is that they re-
<br />late to data that were not observed under a
<br />model that is known to be false. How meaninb
<br />ful can they be? But they are objective, at least;
<br />or are they?
<br />P is Arbitrary
<br />If the null hypothesis tnily is false ~(as most of
<br />those tested really aze), then P can be made as
<br />small as one wishes, by getting a lane enough
<br />sample. P is a function of ;1) the difference be-
<br />hyeen realih- and the null hypothesis. and (?) the
<br />sample size Suppose, for example, that you are
<br />testing to see if die mean of~ a population (µ) is,
<br />say, 100. The null hypothesis then is H,,: µ =
<br />10t). versus die alternative hypothesis of H,: µ
<br />100. Qne mi~,ht use Shtdents t-test, yyhich is
<br />;.i' - 100 ~
<br />t = S x b err - l;
<br />where .i is the mean oC the sample, S is thN
<br />staudicrd deviation of th<~ sau:pie. and n is the
<br />s;unple size. Clrarh. t can the nrad< arbitrarily
<br />large (and tie P-v~ilne associated with it arhitrari-
<br />h• sm~illl by making either (.i - 100) or
<br />V'(n -1) lar~,e ruon~h. As the sample size in-
<br />creases, 1'r - IO01 and S will appraxinrtteh~ sta-
<br />bilize at the hire paraurrter valrces. Fence, a
<br />large valr:e_ oC rr truhslates into a lame value oC
<br />t. Tltis strom~ depende.)hce oC P on the sample
<br />size led Good (19521 to suggest that P-values he
<br />st.cnclardizecl to a sanhple size of IO(1. by replac-
<br />ing P by P V rr/ 10 for O.S.. if that is smaller).
<br />Even more arbitrary in a sense than P is the
<br />use of a standard cutoff value, usually denoted
<br />a. P-y:~lues less than or equal to a are deemed
<br />sir iificant; those greater than a are nonsignifi-•
<br />cant. Use of a was advocated by Jerry Neyman
<br />and Egon Pearson, whereas R. A. Fisher rec-
<br />ommended presentation of observed P-values
<br />instead (Huberty 1993). Use of a fired a level,
<br />say a = 0.05, promotes the seemingly nonsen-
<br />sical distinction behveen a significant finding if
<br />P = 0.049, and a nonsignificant finding if P =
<br />0.051. Such minor differences are illusory any-
<br />way, as they derive from tests whose assump-
<br />tions often are only approximately met (Preece.
<br />1990). Fisher objected to the Nevman-Pearson
<br />procedure because of its mechanical, automat-
<br />ed nature (~Iulaik et al. 1997).
<br />Proving the Null Hypothesis
<br />Discourses on hypothesis testing emphasize
<br />that null hypotheses cannot be proved, they can
<br />only be disproved (rejected). Failing to reject a
<br />null hypothesis does not mean that it is true.
<br />Especially with small samples, one must be
<br />careful not to accept the null hypothesis. Con-
<br />sider atest of the null hypothesis- that a mean
<br />µ equals µ,,. The situations illustrated in Figure
<br />1 both reflect a failure to reject that hypothesis.
<br />Figure 1~ suggests the null hypothesis may well
<br />be faise. but the sample was too small to indi-
<br />cate significance: there is a lacl- of power. Con-
<br />versely. Figure 1B shows that the data truly
<br />were consistent with the null hypothesis. The ?
<br />situations should lead to dilTerent conclusions
<br />about µ. but the P-values associated with the
<br />tests are identical.
<br />Taking another look ut the ?issues of Thy
<br />frncrnnl o{ tt'ilrllifr• 1•(cnra~ernenr. 1 noted a num-
<br />ber of articles that indicated a mrll hypothesis
<br />w~u nrrn~en..~moug these were ~ no <lifference
<br />in slope aspect of random sna_s P = 0.113, n =
<br />.~-'~, ~:•?: uo diffrreace in yiuble s~tals iF,,; _ :i.l-i,
<br />P = 1.11;, i:~1 lalnh 1.111 yvkLS ^~~t CY)rrelated t0
<br />trapper banes i1-;_ = 0.70. P = 0.09.1, (-1) no
<br />effect clue to month (P = 0.07. n = 1•~), and (~
<br />nu sighhiflcant differences in sur.iyal clistribntions
<br />(P-vatlues ? O.Oi-I', n ya:iahle). I selected the ex-
<br />amples to ilhstrate null h~potlheses clauuhed to
<br />he tnre. despite small sample sizes and P-yc~l::es
<br />that weer small hnt (nsnally-) ~0.(-.5..~.ll exam-
<br />ples. [ belies r-, rrHrct tl:e lack of power (Fi;. l.-~i
<br />while claiming a lack of effect ~ Fig,. 1B1.
<br />Power Analysis
<br />Power ~u:alvsis is au adjunct to hypothesis
<br />testing that has become iucreasiugh• popular
<br />
|