CRDSS_Task10-2_EvaluateExtensionHistoricalData

Home Browse Search

of record for the independent stations must be complete and concurrent, and the inclusion of a stochastic component limits reproducibility of the new data. The third and final method is the mixed station approach (Alley and Burns 1983). Simple linear regression is used to fill or extend data, but for each missing value a different independent gage maybe selected depending on the minimization of the prediction error. This allows a number of gages with different periods of record, and thus the best available data, to be used in extending or filling the incomplete record. The model will automatically select the best gage to regress against in terms of the highest seasonal or annual correlation, making it a good choice where the use of several partially complete time series is desired. One criticism of this approach is that gage selection is automatic, making the analysis difficult to reproduce outside of the model. However, this may also be the case with multiple regression, where the equation will change when a gap is reached in one of the independent series. The three approaches presented could all be implemented as part of the CRDSS database extension. The multivariate model is probably the least desirable because of the concurrent record requirement, which maybe difficult to meet in some situations. Multiple regression techniques would work well and are widely used, but may not provide the best regression available because they take into account the correlation of several gages, whereas the mixed station approach uses the most highly correlated gage. Implementation of a multiple regression technique would require either writing new code, or including a statistical package far more complex than the needs of the CRDSS. The mixed station model is advantageous because it can be implemented for a number of gages with differing periods of record, automatically choosing the record with the best correlation for each value to be filled. It will evaluate seasonal and annual correlations, and provides four regression options: simple linear regression, regression with noise, MOVEI, and MOVE2. Additional gages can also be easily added to the model matrix, if necessary. The preceding discussion leads to the recommendation of the mixed station model for data filling and extension in the CRDSS database. It is suggested that a front end interface be written that will compute basic statistics of the incomplete and filled records. These statistics will be compared by the user and a decision made as to whether or not the statistics were adequately maintained. If they were, the user will proceed in the model. Otherwise, a different regression technique will be chosen and the process repeated. Within this subroutine, the stochastic generation of data could also be included in the same manner. This approach will allow the implementation of one model for data filling and extension in the CRDSS, but provide at the same time several options for statistical maintenance at the user's discretion. A program to run the mixed station approach was written by the USGS in 1989. It has been converted to PC language, but the regression plus noise option was omitted from that conversion. It has not since been modified, and Ayres Associates has obtained a copy of the program and code. The following information is taken directly from literature accompanying the USGS model. The model is written in Fortran 77 and runs on RM/Fortran Versions 2.11 and 2.43 (and possibly earlier versions). Apparently, the program will not run using other Fortran software without some modifications to the source code. It is dimensioned for as many as 23 stations and 83 years of record, although these limits can be changed easily. Depending on the magnitude of flows, the user may wish to change 6f7.1 on format statement 2090 to a different format. This format statement is used to output the extended record to file 11. The program requires three files: (1) File 11, the output file for extended flow record; (2) File 10, the file where input data are stored; and (3) File 20, the output file for summary of extension Appendix E E-20