Laserfiche WebLink
<br />. <br /> <br />The correct computation of confidence <br />intervals requires that the distribution of the <br />observations be known. But very often approxi- <br />mations are close enough to correctness to be of <br />use, and often are, or may be made to be, con- <br />servative. For computation of confidence inter- <br />vals for the mean, the normal distribution is <br />usually assumed to apply for several reasons: the <br />central limit theorem assures us that with large <br />samples the mean is likely to be approximately <br />normally distributed; the required computations <br />are well known and are easily applied; and when <br />the normal distribution is known not to apply, <br />suitable transformation of the data often is avail- <br />able to allow a valid application. <br />The confidence interval for a mean is an inter- <br />val within which the true mean is said to have <br />some stated probability of being found. If the <br />probability of the mean not being in the interval <br />is a (a could equal .1, .05, .01 or any probability <br />value), then the statement may be written <br /> <br />P (CLl < P.<CL2)= 1- Q <br /> <br />This is read, "The probability that the lower con- <br />fidence limit (CLl ) is less than the true mean (JJ.) <br />and that the upper confidence limit (C~) is <br />greater than the true mean, equals 1 - a." How- <br />ever, we never know whether or not the true <br />mean is actually included in the interval. So the <br />confidence interval statement is really a state- <br />ment about our procedure rather than about JJ.. <br />It says that if we follow the procedure for re- <br />peated experiments, a proportion of those ex- <br />periments equal to a will, by chance alone, fail <br />to include the true mean between our limits. For <br />example, if a = .05, we can expect 5 of 100 <br />confidence intervals to fail to include the tnie <br />mean. <br /> <br />To compute the limits, the sample mean, X; <br />the standard error, sj{; and the degrees of <br />freedom, n-l; must be known. A ta n- 1 value <br />from tables of Student's t is obtained corre- <br />sponding to n-l degrees of freedom and <br />probability a. The computation is <br /> <br />. <br /> <br />CLl = X - (ta) (sj{) <br /> <br />C~ =x + (ta) (sj{) <br /> <br />Other confidence limits may be computed, <br />and one additional confidence limit is given in <br /> <br />BIOMETRICS - CONFIDENCE INTERVALS <br /> <br />this section - the confidence limits for the true <br />variance, a2. The information needed here is <br />similar to that needed for the mean, namely, the <br />estimated variance, S2; the degrees of freedom, <br />n-l . and values from X2 tables. The values from <br />, <br />X 2 depend upon the degrees of freedom and <br />upon the probability level, a. The confidence <br />interval is <br /> <br />P [(n-. 1)82 ~ a2 ~ (n-1)s2 J' = 1- Q <br />2 - - 2 <br />X ~ X l-~ <br />2 2 <br /> <br />This will be illustrated for a = .05; (n- 1) = 30; <br /> <br />and S2 = 5. Since a = .05; 1 - ~ = 0.975; the <br /> <br />associated X2 975 = 16.8 and the X20 025 = <br />47.25. Thus, the probability statement for the <br />variance in this case is <br /> <br />P ~.19:5 02 ~ 8.93] = .95 <br /> <br />7.0 LINEAR REGRESSION AND CORRE- <br />LA nON <br /> <br />7.1 Basic Concepts <br /> <br />It is often desired to investigate relationships <br />between variables, i.e., rate of change of biomass <br />and concentration of some nutrient; mortality <br />per unit of time and concentration of some <br />toxic substance; chlorophyll and biomass; or <br />growth rate and temperature. As biologists, we <br />appreciate the incredible complexity of the real- <br />world relationships between such variables, but, <br />simultaneously, we may wish to investigate the <br />desirability of approximating these relationships <br />with a straight line. Such an approximation may <br />prove invaluable if used judiciously within the <br />limits of the conditions where the relation holds. <br />It is important to recognize that no matter how <br />well the straight line describes the data, a causal <br />relationship between the variables is never <br />implied. Causality is much more difficult to <br />establish than mere description by a statistical <br />relation. <br />When studying the relationship between two <br />variables, the data may be taken in one of two <br />ways. One way is to measure two variables, e.g., <br />measure dry weight biomass and an associated <br />chlorophyll measurement. Where two variables <br /> <br />19 <br />