Laserfiche WebLink
i <br />~ ... <br />~ .. <br />.w <br />~~, <br />er examples, which-are what Abelson (199 ~) re- <br />ferred to as "gratuitous" s `gnific:uice testing- <br />testing .what is already known. <br />Three comments in favor of the point null <br />hypothesis, such as µ = p.,r. First, while such <br />hypotheses are virtually alws~~s false for sam- <br />pling studies, they may be reasonable for ex- <br />perimental studies in which subjects are ran- <br />domly assigned to treatment groups (Niulaik et <br />al. 1997). Second, testing a point null hypothesis <br />in fact does provide a reasonable approximation <br />to a more appropriate question: is µ neazly <br />equal to µ~ (Berger and Delampady 1987, Ber- <br />ger and Selll:e 1987), if the sample size is mod- <br />est (Rindskopf 1997). Large sample .sizes will <br />result in small P-values even if µ is nearly equal <br />to µ,,. Third, testing the point null hypothesis is <br />mathematically much easier than testing com- <br />posite null hypotheses, which involve noncen- <br />trality parameters (Steiger and Fouladi 1997). <br />The bottom line on P-values is that they re- <br />late to data that were not observed under a <br />model that is known to be false. How meaninb <br />ful can they be? But they are objective, at least; <br />or are they? <br />P is Arbitrary <br />If the null hypothesis tnily is false ~(as most of <br />those tested really aze), then P can be made as <br />small as one wishes, by getting a lane enough <br />sample. P is a function of ;1) the difference be- <br />hyeen realih- and the null hypothesis. and (?) the <br />sample size Suppose, for example, that you are <br />testing to see if die mean of~ a population (µ) is, <br />say, 100. The null hypothesis then is H,,: µ = <br />10t). versus die alternative hypothesis of H,: µ <br />100. Qne mi~,ht use Shtdents t-test, yyhich is <br />;.i' - 100 ~ <br />t = S x b err - l; <br />where .i is the mean oC the sample, S is thN <br />staudicrd deviation of th<~ sau:pie. and n is the <br />s;unple size. Clrarh. t can the nrad< arbitrarily <br />large (and tie P-v~ilne associated with it arhitrari- <br />h• sm~illl by making either (.i - 100) or <br />V'(n -1) lar~,e ruon~h. As the sample size in- <br />creases, 1'r - IO01 and S will appraxinrtteh~ sta- <br />bilize at the hire paraurrter valrces. Fence, a <br />large valr:e_ oC rr truhslates into a lame value oC <br />t. Tltis strom~ depende.)hce oC P on the sample <br />size led Good (19521 to suggest that P-values he <br />st.cnclardizecl to a sanhple size of IO(1. by replac- <br />ing P by P V rr/ 10 for O.S.. if that is smaller). <br />Even more arbitrary in a sense than P is the <br />use of a standard cutoff value, usually denoted <br />a. P-y:~lues less than or equal to a are deemed <br />sir iificant; those greater than a are nonsignifi-• <br />cant. Use of a was advocated by Jerry Neyman <br />and Egon Pearson, whereas R. A. Fisher rec- <br />ommended presentation of observed P-values <br />instead (Huberty 1993). Use of a fired a level, <br />say a = 0.05, promotes the seemingly nonsen- <br />sical distinction behveen a significant finding if <br />P = 0.049, and a nonsignificant finding if P = <br />0.051. Such minor differences are illusory any- <br />way, as they derive from tests whose assump- <br />tions often are only approximately met (Preece. <br />1990). Fisher objected to the Nevman-Pearson <br />procedure because of its mechanical, automat- <br />ed nature (~Iulaik et al. 1997). <br />Proving the Null Hypothesis <br />Discourses on hypothesis testing emphasize <br />that null hypotheses cannot be proved, they can <br />only be disproved (rejected). Failing to reject a <br />null hypothesis does not mean that it is true. <br />Especially with small samples, one must be <br />careful not to accept the null hypothesis. Con- <br />sider atest of the null hypothesis- that a mean <br />µ equals µ,,. The situations illustrated in Figure <br />1 both reflect a failure to reject that hypothesis. <br />Figure 1~ suggests the null hypothesis may well <br />be faise. but the sample was too small to indi- <br />cate significance: there is a lacl- of power. Con- <br />versely. Figure 1B shows that the data truly <br />were consistent with the null hypothesis. The ? <br />situations should lead to dilTerent conclusions <br />about µ. but the P-values associated with the <br />tests are identical. <br />Taking another look ut the ?issues of Thy <br />frncrnnl o{ tt'ilrllifr• 1•(cnra~ernenr. 1 noted a num- <br />ber of articles that indicated a mrll hypothesis <br />w~u nrrn~en..~moug these were ~ no <lifference <br />in slope aspect of random sna_s P = 0.113, n = <br />.~-'~, ~:•?: uo diffrreace in yiuble s~tals iF,,; _ :i.l-i, <br />P = 1.11;, i:~1 lalnh 1.111 yvkLS ^~~t CY)rrelated t0 <br />trapper banes i1-;_ = 0.70. P = 0.09.1, (-1) no <br />effect clue to month (P = 0.07. n = 1•~), and (~ <br />nu sighhiflcant differences in sur.iyal clistribntions <br />(P-vatlues ? O.Oi-I', n ya:iahle). I selected the ex- <br />amples to ilhstrate null h~potlheses clauuhed to <br />he tnre. despite small sample sizes and P-yc~l::es <br />that weer small hnt (nsnally-) ~0.(-.5..~.ll exam- <br />ples. [ belies r-, rrHrct tl:e lack of power (Fi;. l.-~i <br />while claiming a lack of effect ~ Fig,. 1B1. <br />Power Analysis <br />Power ~u:alvsis is au adjunct to hypothesis <br />testing that has become iucreasiugh• popular <br />