Laserfiche WebLink
correlated with the dependent series) is chosen by one of several techniques usually provided. One <br />regression equation will be used to fill in values along the time series as long as all of the independent <br />series are concurrent. When a gap in one of these series is reached, the equation must be reevaluated to <br />include a different series with a concurrent record at this time step. <br />A brief review of available statistical packages showed that multiple regression is the most common data <br />filling and extension technique employed. Regression, and on a smaller scale data filling and extension, <br />is just one of numerous statistical computation and analysis tools provided in these packages. Most <br />include at the very least graphing capabilities, hundreds of statistical computations, and numerous <br />analysis tools and techniques. The packages reviewed included Statisitca, SPSS 8.0, StatView v4.5, <br />DataDesk, InStat, Statgraphics, S-Plus, Minitab, NCSS, and SimStat. Generally, the capabilities of <br />these packages far exceed the needs of the CRDSS project, although they could be reasonably employed <br />as part of the system. <br />A disadvantage with the use of regression for data filling and extension is its tendency to cause a <br />variance reduction in the new data. There are ways to combat this, including adding noise to the model, <br />or using variance maintenance as the constraint in the regression model instead of minimum error. The <br />latter method is referred to as Maintenance of Variance (MOVE). While these two techniques will <br />succeed in maintaining the variance of the original data, they each have drawbacks. The first technique <br />will limit the reproducibility of the new data due to the addition of a stochastic component to the model, <br />and MOVE has been shown to overestimate the variance and inflate interstation correlation with the <br />independent gage in some cases. Should regression techniques be used in data extension and filling, it is <br />suggested that a statistical comparison of the original and new data series be performed to ensure <br />satisfactory maintenance of the statistics. The preliminary review of statistical packages for this study <br />did not go into sufficient depth to investigate whether or not these techniques were available in most of <br />the packages. <br />The second method uses a multivariate approach to data filling, in this case referring to several stations <br />or gages, as opposed to more than one parameter (Salas et al. 1994). The multivariate model takes into <br />account the cross correlations between a set of series with missing values and a set of series with <br />complete records. The missing values at the dependent stations at time t+1 are described by a linear <br />combination of the values at those stations at time t, and the values of the independent stations at times, t <br />and t+1. The equation takes the following form. <br />Y <br />X~ <br />The linear parameter matrices A and B, are estimated using standard time series estimation <br />procedures, and s is a stochastic component introduced to maintain variance. Designating p as the <br />number of dependent gages and m as the number of independent gages, Y will be a 1 x p matrix, A <br />will be a (p+2m) x p matrix, and B will be p x p matrix. <br />The multivariate method is advantageous in that several short or incomplete records can be filled at <br />once. All of the gages (independent and dependent) are used to estimate missing values, taking into <br />account lag-1 autocorrelations in the incomplete or short records. The disadvantages are that all periods <br />Appendix E E-19 <br />