the same result if the experiment were repeat-
<br />ed. Si~n~ificarit differences are often termed °re-
<br />liable- under this uiterpretation.
<br />Alternatively, P can be treated as the proba-
<br />bility that the null lnpothesis is true. This in-
<br />terpretation is the most direct one, as it ad-
<br />dresses head-on the question that interests the
<br />investigator.
<br />These 3 interpretations are what Carver
<br />(1978) termed fantasies about statistical signifi-
<br />cance. None of them is true, although they are
<br />treated as if they were true in some statistical
<br />textbooks and applications papers. Small values
<br />of P are taken to represent strong evidence that
<br />the null hypothesis is false, but workers dem-
<br />onstrated long ago (see references in Berger
<br />and Selll:e 1987) that such is not the case. In
<br />-fact, Berger and Sellke (1987) gave an example
<br />for which a P-value of 0.05 was attained with a
<br />sample of n = 50, but the probability that the
<br />null hypothesis was true was 0.52. Further, the
<br />disparity between P and Pr[H,~ I data], the prob-
<br />ability of the null hypothesis given_ the observed
<br />data, increases as samples become larger.
<br />In reality; P is the Pr[observed or more ex-
<br />treme data I H„], the probability of the observed
<br />data or data more extreme, given that the null
<br />hypothesis is true, the assumed model is cor-
<br />rect, and the sampling w•as done randomly. Let
<br />us consider the first 2 assumptions.
<br />What-areJMore Extreme Data?
<br />. Suppose you have a sample consisting of 10
<br />males and ~ females. Fur a null hypothesis of a
<br />balanced sex ratio. ~yhat samples would be more
<br />extreme? The answer to that question depends
<br />on the samplin;~ plan used to coilect the data
<br />ii.e., what stoopin; ntle was usedl. The most
<br />ob~io,cs ans~~~er is haled on the assumption that
<br />a total of 13 iudi~iduals wits sampled: In that
<br />cusr. outcomes more extreme than 10 males
<br />and :i fen+ales would he 11 males and2 Females.
<br />1_' males and 1 female, and l:i males and uo
<br />lemaies.
<br />[tuwe~~r. the iu~estigator might ha~~e decided
<br />to stop san,plim~ :u soon ~u lie encountered 10
<br />mgrs. \\ere that the sihaition. ti,e possible ocit-
<br />cY>+nes nu,re extreme against tl~e mill h~rothesis
<br />would he 11) ,n.des and 2 females. 1U macs and
<br />1 frn,ale, a,ul 10 n,,drs and no frmides. Converse-
<br />ly. tlir iu~~esti~,ator mi~~ht have. collectrc! data until
<br />:3 lem,drs were encoiuiterrd. The cumber of
<br />morn exrtremr o,ctcocnes turn are inficute: they in-
<br />clude 11 males and :3 Ceu~ales. 13 modes and :3
<br />females, 13 miles and :3 females. etc. Alterna;tve-
<br />ly, the unesstigator might have collected data until
<br />the dil~ereuce behveen the mm~bers of males and
<br />females was 7, or until the dil~'erence was sibanif-
<br />icant at some level. Each set of more extreme
<br />outcomes has its own probability, which, along
<br />with the probability of the result actually ob-
<br />tained, constitutes P.
<br />The point is that determining which outcomes
<br />of an e.~cperiment or survey are more extreme
<br />than the observed one, so a P-value can be cal-
<br />culated, requires knowledge of the intentions of
<br />the investigator (Be bee and Berry 1988). Hence,
<br />P, the outcome of a statistical hypothesis test, -de-
<br />pends on results that were not obtained; that is,
<br />something that did not happen, and what the
<br />intentions of the investigator were.
<br />Are Null Hypotheses Really True?
<br />P is calculated under the assumption that the
<br />null hypothesis is true. Most null hypotheses
<br />tested, however, state that some parameter
<br />equals zero, or that some set of parameters aze
<br />all equal. These hypotheses, called- point null
<br />hypotheses, are almost invariably known to be
<br />false before any data are collected (Berkson
<br />1938, Savage 1957, Johnson 1995). If such hy-
<br />potheses are not rejected, it is usually because
<br />the sample siie is too small (vunnally 1960).
<br />To see if the null h~-potheses bein; tested in
<br />The Journal of ~t-'ilrllife Llana~ement can validly
<br />be considered to be true. I arbitrarily selected
<br />2 issues: an issue from the 1996 volume, the
<br />other from 1998. I scanned the results section
<br />of each paper, lookiuQ for P-i•aiues. For each P-
<br />~ aloe I found, I looked back to see what ln--
<br />pothesis w~is being tested. I made a ~ ens biased
<br />selection of some conclusions reached by re-
<br />jectirn, null h~potl,eses: these include: (1) the
<br />occurrence of sheep remains in crr~~otr Carri.s
<br />Intrrn,.~; scats differed ainou; seasons P = 0.03.
<br />n = -i6-'.. ~:?; cl+i~l:iin~~ boo]. mass differed
<br />anun,~~ .'ears iP < 1).111101!. and ;3, the density
<br />of large trees a~as greater in uniog,ecl lerest
<br />stands than in loggNd stands ;P = i).1)3). (Tha
<br />last is tnc personal Ca.-orate.) Certainly wr 1.-new
<br />hrli>re an.' data ~.c~err collected that tl~e null hy-
<br />pothe•ses bring tested were false. Slierp remains
<br />cert.uuh mast l,a~~e varied among se.uons, if
<br />ooh' brhveeu 6L1~% in 1 season and 61 °% in
<br />another. The only durstion was whether or not
<br />the sample size was sufficient to detect the dif-
<br />Cerence. Like.~ise, we know before data are col-
<br />lected that there arc: real differences in the oth-
<br />rm
<br />i
<br />
|