9429 - Laserfiche WebLink

Home Browse Search

i ~ ... ~ .. .w ~~, er examples, which-are what Abelson (199 ~) re- ferred to as "gratuitous" s `gnific:uice testing- testing .what is already known. Three comments in favor of the point null hypothesis, such as µ = p.,r. First, while such hypotheses are virtually alws~~s false for sam- pling studies, they may be reasonable for ex- perimental studies in which subjects are ran- domly assigned to treatment groups (Niulaik et al. 1997). Second, testing a point null hypothesis in fact does provide a reasonable approximation to a more appropriate question: is µ neazly equal to µ~ (Berger and Delampady 1987, Ber- ger and Selll:e 1987), if the sample size is mod- est (Rindskopf 1997). Large sample .sizes will result in small P-values even if µ is nearly equal to µ,,. Third, testing the point null hypothesis is mathematically much easier than testing com- posite null hypotheses, which involve noncen- trality parameters (Steiger and Fouladi 1997). The bottom line on P-values is that they re- late to data that were not observed under a model that is known to be false. How meaninb ful can they be? But they are objective, at least; or are they? P is Arbitrary If the null hypothesis tnily is false ~(as most of those tested really aze), then P can be made as small as one wishes, by getting a lane enough sample. P is a function of ;1) the difference be- hyeen realih- and the null hypothesis. and (?) the sample size Suppose, for example, that you are testing to see if die mean of~ a population (µ) is, say, 100. The null hypothesis then is H,,: µ = 10t). versus die alternative hypothesis of H,: µ 100. Qne mi~,ht use Shtdents t-test, yyhich is ;.i' - 100 ~ t = S x b err - l; where .i is the mean oC the sample, S is thN staudicrd deviation of th<~ sau:pie. and n is the s;unple size. Clrarh. t can the nrad< arbitrarily large (and tie P-v~ilne associated with it arhitrari- h• sm~illl by making either (.i - 100) or V'(n -1) lar~,e ruon~h. As the sample size in- creases, 1'r - IO01 and S will appraxinrtteh~ sta- bilize at the hire paraurrter valrces. Fence, a large valr:e_ oC rr truhslates into a lame value oC t. Tltis strom~ depende.)hce oC P on the sample size led Good (19521 to suggest that P-values he st.cnclardizecl to a sanhple size of IO(1. by replac- ing P by P V rr/ 10 for O.S.. if that is smaller). Even more arbitrary in a sense than P is the use of a standard cutoff value, usually denoted a. P-y:~lues less than or equal to a are deemed sir iificant; those greater than a are nonsignifi-• cant. Use of a was advocated by Jerry Neyman and Egon Pearson, whereas R. A. Fisher rec- ommended presentation of observed P-values instead (Huberty 1993). Use of a fired a level, say a = 0.05, promotes the seemingly nonsen- sical distinction behveen a significant finding if P = 0.049, and a nonsignificant finding if P = 0.051. Such minor differences are illusory any- way, as they derive from tests whose assump- tions often are only approximately met (Preece. 1990). Fisher objected to the Nevman-Pearson procedure because of its mechanical, automat- ed nature (~Iulaik et al. 1997). Proving the Null Hypothesis Discourses on hypothesis testing emphasize that null hypotheses cannot be proved, they can only be disproved (rejected). Failing to reject a null hypothesis does not mean that it is true. Especially with small samples, one must be careful not to accept the null hypothesis. Con- sider atest of the null hypothesis- that a mean µ equals µ,,. The situations illustrated in Figure 1 both reflect a failure to reject that hypothesis. Figure 1~ suggests the null hypothesis may well be faise. but the sample was too small to indi- cate significance: there is a lacl- of power. Con- versely. Figure 1B shows that the data truly were consistent with the null hypothesis. The ? situations should lead to dilTerent conclusions about µ. but the P-values associated with the tests are identical. Taking another look ut the ?issues of Thy frncrnnl o{ tt'ilrllifr• 1•(cnra~ernenr. 1 noted a num- ber of articles that indicated a mrll hypothesis w~u nrrn~en..~moug these were ~ no <lifference in slope aspect of random sna_s P = 0.113, n = .~-'~, ~:•?: uo diffrreace in yiuble s~tals iF,,; _ :i.l-i, P = 1.11;, i:~1 lalnh 1.111 yvkLS ^~~t CY)rrelated t0 trapper banes i1-;_ = 0.70. P = 0.09.1, (-1) no effect clue to month (P = 0.07. n = 1•~), and (~ nu sighhiflcant differences in sur.iyal clistribntions (P-vatlues ? O.Oi-I', n ya:iahle). I selected the ex- amples to ilhstrate null h~potlheses clauuhed to he tnre. despite small sample sizes and P-yc~l::es that weer small hnt (nsnally-) ~0.(-.5..~.ll exam- ples. [ belies r-, rrHrct tl:e lack of power (Fi;. l.-~i while claiming a lack of effect ~ Fig,. 1B1. Power Analysis Power ~u:alvsis is au adjunct to hypothesis testing that has become iucreasiugh• popular