<br />.
<br />
<br />SOME STATISTICAL TOOLS IN HYDROLOGY
<br />
<br />19
<br />
<br />.
<br />
<br />are expected to influence the dependent variable,
<br />(2) describing these factors quantitatively, (3)
<br />selecting the regression model, (4) computing
<br />the regression equation, the standard error of
<br />estimate, and the significance of the regression
<br />coefficients, and (5) evaluating the results.
<br />Selection of the appropriate factors should
<br />not be a statistical problem, but statistical
<br />concepts must enter into the process. If the
<br />analyst merely wants to know the relation of
<br />annual precipitation to annual runoff, he can
<br />proceed directly to selection of a model. But if
<br />his problem is to make the best possible estimate
<br />of runoff, he will include other factors, some
<br />of which may be related to each other as well
<br />as to runoff. The problem of determining if
<br />certain factors are related to the dependent
<br />variable requires careful selection of indices
<br />describing these factors quantitatively. These
<br />indices should accurately reflect the effects, and
<br />no two should describe the same thing. It is a
<br />characteristic of regression that if a factor is
<br />related to a dependent variable and this factor
<br />is entered in the regression model twice (as
<br />two different variables), the effect on the
<br />dependent variable will be divided equally
<br />between the two. Thus, if the total effect is
<br />small, the result of dividing it in two parts
<br />may be to produce nonsignificance in each
<br />of the parts. Likewise, several closely related
<br />variables may compute as nonsignificant,
<br />whereas one properly selected index would
<br />show a real effect. Thus, the independent
<br />variables should be selected with considerable
<br />care; the shotgun approach should not be used.
<br />Another consideration in selection of var-
<br />iables is to avoid having a variable, or a part
<br />thereof,. on both sides of the equation. Such a
<br />condition may be acceptable for certain prob-
<br />lems, but the results must be evaluated
<br />carefully. A spurious relation may result, or
<br />the relation may be correct but its reliability
<br />difficult to assess. Benson (1965) described
<br />ways in which spurious relations may be built
<br />into a regression.
<br />The user of the regression method should
<br />understand the effect of related independent
<br />variables on the computed regression coeffi-
<br />cients. If the independent variables are entirely
<br />unrelated, the simple regression coefficients
<br />and the corresponding partial regression co-
<br />
<br />I.
<br />
<br />.
<br />
<br />efficients would be the same. However, such
<br />conditions rarely occur in nature. The multiple
<br />regression method provides a way of separating
<br />the total effect of the independent variables
<br />into the effect of each independent variable
<br />and an unexplained effect. Consider the simple
<br />regressIOn
<br />Y=a+bIXd:error, (1)
<br />
<br />where Y also is affected by another variable,
<br />X" which is related to Xl' The regression using
<br />Xl and X, will be
<br />
<br />Y=a' + b;XI + b,x,:\: error, (2)
<br />
<br />where b; .,..bl. If Xl and X, are the only variables
<br />affecting Y (and the effects are linear), then
<br />equation 2 completely describes Y, and b;
<br />and b, are the true values of the regression
<br />coefficients (except for sampling errors). If
<br />Xl and X, are positively correlated with each
<br />other and with Y, consider the effect on the
<br />magnitude of bl. For each value of Xl in
<br />equation 1, Y will appear to be more closely
<br />related than it actually is because X, increases
<br />with X, and its influence on Y is real though
<br />unmeasured. Therefore the regression coefficient
<br />bl is larger than its true value b;.
<br />Similar changes in bl and b, would occur if
<br />another factor, related to Xl and X, and Y,
<br />were included in the regression. These changes
<br />in the magnitudes of the regression coeffi-
<br />cients due to addition or deletion of a variable
<br />are characteristic of regression. They are some-
<br />times interpreted as indicating that partial
<br />regression coefficients have no physical meaning.
<br />Such interpretations are not necessarily correct.
<br />If the variables used in the regression are
<br />selected on physical principles and the effects
<br />of each of the variables is appreciable, then the
<br />partial regression coefficients should be in
<br />accJlrd with physical principles. In fact, it is
<br />good practice to compare the sign and the
<br />general magnitude of each partial regression
<br />coefficient with that expected. Benson (1962,
<br />p. 52-55) made a thorough comparison of
<br />this kind.
<br />The regression coefficients of certain var-
<br />iables may change sign when another related
<br />variable is added to or deleted from the re-
<br />gression. This effect may result because (1) the
<br />variable is not a good index of the physical
<br />feature represented, (2) the effect of the var-
<br />
|