Laserfiche WebLink
of record for the independent stations must be complete and concurrent, and the inclusion of a stochastic <br />component limits reproducibility of the new data. <br />The third and final method is the mixed station approach (Alley and Burns 1983). Simple linear <br />regression is used to fill or extend data, but for each missing value a different independent gage maybe <br />selected depending on the minimization of the prediction error. This allows a number of gages with <br />different periods of record, and thus the best available data, to be used in extending or filling the <br />incomplete record. The model will automatically select the best gage to regress against in terms of the <br />highest seasonal or annual correlation, making it a good choice where the use of several partially <br />complete time series is desired. One criticism of this approach is that gage selection is automatic, <br />making the analysis difficult to reproduce outside of the model. However, this may also be the case with <br />multiple regression, where the equation will change when a gap is reached in one of the independent <br />series. <br />The three approaches presented could all be implemented as part of the CRDSS database extension. The <br />multivariate model is probably the least desirable because of the concurrent record requirement, which <br />maybe difficult to meet in some situations. Multiple regression techniques would work well and are <br />widely used, but may not provide the best regression available because they take into account the <br />correlation of several gages, whereas the mixed station approach uses the most highly correlated gage. <br />Implementation of a multiple regression technique would require either writing new code, or including a <br />statistical package far more complex than the needs of the CRDSS. The mixed station model is <br />advantageous because it can be implemented for a number of gages with differing periods of record, <br />automatically choosing the record with the best correlation for each value to be filled. It will evaluate <br />seasonal and annual correlations, and provides four regression options: simple linear regression, <br />regression with noise, MOVEI, and MOVE2. Additional gages can also be easily added to the model <br />matrix, if necessary. <br />The preceding discussion leads to the recommendation of the mixed station model for data filling and <br />extension in the CRDSS database. It is suggested that a front end interface be written that will compute <br />basic statistics of the incomplete and filled records. These statistics will be compared by the user and a <br />decision made as to whether or not the statistics were adequately maintained. If they were, the user will <br />proceed in the model. Otherwise, a different regression technique will be chosen and the process <br />repeated. Within this subroutine, the stochastic generation of data could also be included in the same <br />manner. This approach will allow the implementation of one model for data filling and extension in the <br />CRDSS, but provide at the same time several options for statistical maintenance at the user's discretion. <br />A program to run the mixed station approach was written by the USGS in 1989. It has been converted <br />to PC language, but the regression plus noise option was omitted from that conversion. It has not since <br />been modified, and Ayres Associates has obtained a copy of the program and code. <br />The following information is taken directly from literature accompanying the USGS model. The model <br />is written in Fortran 77 and runs on RM/Fortran Versions 2.11 and 2.43 (and possibly earlier versions). <br />Apparently, the program will not run using other Fortran software without some modifications to the <br />source code. It is dimensioned for as many as 23 stations and 83 years of record, although these limits <br />can be changed easily. Depending on the magnitude of flows, the user may wish to change 6f7.1 on <br />format statement 2090 to a different format. This format statement is used to output the extended record <br />to file 11. The program requires three files: (1) File 11, the output file for extended flow record; (2) <br />File 10, the file where input data are stored; and (3) File 20, the output file for summary of extension <br />Appendix E E-20 <br />