Laserfiche WebLink
the same result if the experiment were repeat- <br />ed. Si~n~ificarit differences are often termed °re- <br />liable- under this uiterpretation. <br />Alternatively, P can be treated as the proba- <br />bility that the null lnpothesis is true. This in- <br />terpretation is the most direct one, as it ad- <br />dresses head-on the question that interests the <br />investigator. <br />These 3 interpretations are what Carver <br />(1978) termed fantasies about statistical signifi- <br />cance. None of them is true, although they are <br />treated as if they were true in some statistical <br />textbooks and applications papers. Small values <br />of P are taken to represent strong evidence that <br />the null hypothesis is false, but workers dem- <br />onstrated long ago (see references in Berger <br />and Selll:e 1987) that such is not the case. In <br />-fact, Berger and Sellke (1987) gave an example <br />for which a P-value of 0.05 was attained with a <br />sample of n = 50, but the probability that the <br />null hypothesis was true was 0.52. Further, the <br />disparity between P and Pr[H,~ I data], the prob- <br />ability of the null hypothesis given_ the observed <br />data, increases as samples become larger. <br />In reality; P is the Pr[observed or more ex- <br />treme data I H„], the probability of the observed <br />data or data more extreme, given that the null <br />hypothesis is true, the assumed model is cor- <br />rect, and the sampling w•as done randomly. Let <br />us consider the first 2 assumptions. <br />What-areJMore Extreme Data? <br />. Suppose you have a sample consisting of 10 <br />males and ~ females. Fur a null hypothesis of a <br />balanced sex ratio. ~yhat samples would be more <br />extreme? The answer to that question depends <br />on the samplin;~ plan used to coilect the data <br />ii.e., what stoopin; ntle was usedl. The most <br />ob~io,cs ans~~~er is haled on the assumption that <br />a total of 13 iudi~iduals wits sampled: In that <br />cusr. outcomes more extreme than 10 males <br />and :i fen+ales would he 11 males and2 Females. <br />1_' males and 1 female, and l:i males and uo <br />lemaies. <br />[tuwe~~r. the iu~estigator might ha~~e decided <br />to stop san,plim~ :u soon ~u lie encountered 10 <br />mgrs. \\ere that the sihaition. ti,e possible ocit- <br />cY>+nes nu,re extreme against tl~e mill h~rothesis <br />would he 11) ,n.des and 2 females. 1U macs and <br />1 frn,ale, a,ul 10 n,,drs and no frmides. Converse- <br />ly. tlir iu~~esti~,ator mi~~ht have. collectrc! data until <br />:3 lem,drs were encoiuiterrd. The cumber of <br />morn exrtremr o,ctcocnes turn are inficute: they in- <br />clude 11 males and :3 Ceu~ales. 13 modes and :3 <br />females, 13 miles and :3 females. etc. Alterna;tve- <br />ly, the unesstigator might have collected data until <br />the dil~ereuce behveen the mm~bers of males and <br />females was 7, or until the dil~'erence was sibanif- <br />icant at some level. Each set of more extreme <br />outcomes has its own probability, which, along <br />with the probability of the result actually ob- <br />tained, constitutes P. <br />The point is that determining which outcomes <br />of an e.~cperiment or survey are more extreme <br />than the observed one, so a P-value can be cal- <br />culated, requires knowledge of the intentions of <br />the investigator (Be bee and Berry 1988). Hence, <br />P, the outcome of a statistical hypothesis test, -de- <br />pends on results that were not obtained; that is, <br />something that did not happen, and what the <br />intentions of the investigator were. <br />Are Null Hypotheses Really True? <br />P is calculated under the assumption that the <br />null hypothesis is true. Most null hypotheses <br />tested, however, state that some parameter <br />equals zero, or that some set of parameters aze <br />all equal. These hypotheses, called- point null <br />hypotheses, are almost invariably known to be <br />false before any data are collected (Berkson <br />1938, Savage 1957, Johnson 1995). If such hy- <br />potheses are not rejected, it is usually because <br />the sample siie is too small (vunnally 1960). <br />To see if the null h~-potheses bein; tested in <br />The Journal of ~t-'ilrllife Llana~ement can validly <br />be considered to be true. I arbitrarily selected <br />2 issues: an issue from the 1996 volume, the <br />other from 1998. I scanned the results section <br />of each paper, lookiuQ for P-i•aiues. For each P- <br />~ aloe I found, I looked back to see what ln-- <br />pothesis w~is being tested. I made a ~ ens biased <br />selection of some conclusions reached by re- <br />jectirn, null h~potl,eses: these include: (1) the <br />occurrence of sheep remains in crr~~otr Carri.s <br />Intrrn,.~; scats differed ainou; seasons P = 0.03. <br />n = -i6-'.. ~:?; cl+i~l:iin~~ boo]. mass differed <br />anun,~~ .'ears iP < 1).111101!. and ;3, the density <br />of large trees a~as greater in uniog,ecl lerest <br />stands than in loggNd stands ;P = i).1)3). (Tha <br />last is tnc personal Ca.-orate.) Certainly wr 1.-new <br />hrli>re an.' data ~.c~err collected that tl~e null hy- <br />pothe•ses bring tested were false. Slierp remains <br />cert.uuh mast l,a~~e varied among se.uons, if <br />ooh' brhveeu 6L1~% in 1 season and 61 °% in <br />another. The only durstion was whether or not <br />the sample size was sufficient to detect the dif- <br />Cerence. Like.~ise, we know before data are col- <br />lected that there arc: real differences in the oth- <br />rm <br />i <br />