TSTool-Vol2-CommandReference - Laserfiche WebLink

the command. • EnsembleID – all time series in the ensemble will be modified. • FirstMatchingTSID – the first time series that matches the TSID (single TSID or TSID with wildcards) will be modified. • LastMatchingTSID – the last time series that matches the TSID (single TSID or TSID with wildcards) will be modified. • SelectedTS – the time series are those selected with the SelectTimeSeries() command. AllTS TSID The time series identifier or alias for the time series to be modified. Use the * wildcard character to match multiple time series. Required for TSList=*TSID. EnsembleID The ensemble to be modified, if processing an ensemble. Required for TSList=EnsembleID. IndependentTSID The time series identifier or alias for the independent time series. None – must be specified. FillStart The starting date/time for the fill. Available period. FillEnd The ending date/time for the fill. Available period. FillFlag A one-character flag to tag data values that are filled. None – do not flag filled data. FillDirection Specify the direction of the fill as Forward or Backward. Forward FactorMethod Specify how to calculate the factor to use in proration, one of: • AnalyzeAverage – calculate the factor of the average of the time series divided by the independent time series, using the analysis period. • NearestPoint – calculate the factor at the nearest point where both NearestPoint 199 Command Reference – FillProrate() -3 FillProrate() Command TSTool Documentation Parameter Description Default time series have non-missing values. AnalysisStart The starting date/time for the analysis, used when FactorMethod =AnalyzeAverage. Analyze the full period. AnalysisEnd The ending date/time for the analysis, used when FactorMethod=AnalyzeAverage. Analyze the full period. InitialValue The initial value to use for the filled time series, for cases where a value may not be available on the ends of the fill period, one of: • NearestBackward – search the time series backward for the nearest non-missing value. • NearestForward – search the time series forward for the nearest non-missing value. • Specify a number to use for the initial value. None – filling will not occur at the end. A sample command file to fill data from the State of Colorado’s HydroBase database is as follows: # 06754000 -SOUTH PLATTE RIVER NEAR KERSEY 06754000.DWR.Streamflow.Month~HydroB ase # 06694700 -FOURMILE CREEK NEAR FAIRPLAY, CO. 06694700.USGS.Streamflow.Month~HydroBase FillProrate(TSList=AllMatchingTSID,TSID="06754000.DWR.Streamflow.Month", IndependentTSID="06694700.USGS.Stre amflow.Month",FillDirection=Forward, InitialValue=0) 06754000.DWR.Streamflow.Month~HydroBase Command Reference – FillProrate() -4 200 Command Reference – FillRegression() -1 Command Reference: FillRegression() Fill missing time series data using ordinary least squares regression Version 10.21.00, 2013-07-14 The FillRegression() command fills missing data in a time series using ordinary least squares (OLS) regression and provides a variety of options for transforming the data and controlling the analysis. In OLS regression, the vertical distance from the data point to the regression line is minimized. OLS regression provides the minimum-variance estimate for a single value or observation. However, if an ensemble of points is estimated from OLS regression, the estimated values will have lesser variability than the true values. Ordinary Least Squares (OLS) (Xi, Yi) Minimize vertical deviation Y X See also the FillMOVE2()command, which utilizes additional variance from independent time series to determine the regression relationship, and the FillMixedStation() command, which automates the analysis of many time series to determine a “best estimate” filling approach. Regression can be applied only to regular interval time series. The dependent time series will be filled using the independent time series. The periods of record and output period for the time series should be verified to make sure that the time series periods overlap sufficiently. Regression relationships are developed using the analysis period for the time series and are applied to the fill period. Refer to the output statistics table, log file, and time series properties for analysis details. Several parameters are available to ensure that filling uses reasonable relationships. This command has functionality that may not be needed for simple analysis but which is useful for software testing and comparison with the FillMixedStation() command. 201 FillRegression() Command TSTool Documentation Command Reference – FillRegression() -2 Important: TSTool does allow filled values to be flagged. However, other commands do not exclude these values from computations when determining relationships for subsequent fill steps. Therefore, it is important to perform regression data filling as early in data processing as possible so that data manipulation does not introduce derived values and bias. The following OLS equation is used to estimate values for the dependent time series from the independent time series:       = 1 + − 1 _ _ 11 _ _ _ Y Y S Xi X S i r xy or i Yi = a + bX where N1 = concurrent or overlapping period of record (the notation N1 is used because the MOVE2 fill technique refers to N2, which is the number of additional points outside of N1 in the independent time series)= 1 X mean for independent variable for N1 years = 1 1 NX i Σ Y1 = mean for dependent variable for N1 years = 11 NY i Σ = y1 S standard deviation for N1 years = ( )2 1 1 1 1 1 Σ − − Y Y N i = x1 S standard deviation for N1 years = ( )2 1 1 1 1 1 Σ − − X X N i r = R = correlation coefficient = [ Σ (Σ ) ] [ Σ (Σ ) ] Σ Σ Σ − ⋅ − − 2 1 2 1 1 2 1 2 1 1 1 1 1 1 1 i i i N X X N Y Y N X Y X Yi i i i i 11 xy SS b = r a = Y1 -bX1 The correlation coefficient, r, is used to compute the slope, b, of the line. A number of statistics are computed and are available for output to a table, as described below (see the TableID and related command parameters for how to specify the table output). Creating a statistics table and then writing the table to a file is useful for checking the analysis and software. For example, the CompareTables() command can be used to compare this statistics table with a verification data set that is calculated by another tool. In the following descriptions, the statistic for one equation has a name like Mean and monthly statistics correspondingly have a name like Mean_1, where 1 corresponds to January and 12 to December. 202 TSTool Documentation FillRegression() Command Command Reference – FillRegression() -3 In some cases, statistics are relevant in units of the raw values, in some cases statistics are relevant in transformed (log10) units, and in some cases both are relevant. For example, if the log10 transform is used to compute the relationship, then a and b are in transformed units. However, error computations between the original data values and values that would be computed by the relationship are in the raw units (regardless of whether the data were transformed) – this allows errors to be compared between relationships using raw and transformed values (the FillMixedStation() command uses this information to compare relationships). Consequently, the third column of the following table indicates whether statistics are provided in raw (column name uses statistic only) or transformed units (additional _trans added to statistic for column name). Therefore, if the statistic is unitless, it will never have the _trans addition. If the analysis does not use a transformation, then _trans will be omitted from column headings. Statistics From Regression Analysis Statistic (Table Column Name) Involves Dependent, Independent, or Both Statistics Output in Raw or Transformed units Description N1 Both N/A -unitless The number (count) of non-missing data values overlapping in the dependent and independent time series. MeanX1 Independent raw, transformed The mean of the independent N1 data values. SX1 Independent raw, transformed The standard deviation of the independent N1 values. N2 Independent N/A -unitless The number (count) of non-missing independent values outside of N1. MeanX2 Independent raw, transformed The mean of the independent N2 values. SX2 Independent raw, transformed The standard deviation of the independent N2 values. MeanY1 Dependent raw, transformed The mean of the dependent N1 values. SY1 Dependent raw, transformed The standard deviation of the dependent N1 values. NY Dependent N/A -unitless The total number of non-missing dependent values. MeanY Dependent raw, transformed The mean of the dependent NY values. SY Dependent raw, transformed The standard deviation of the dependent NY values. SkewY Dependent raw, transformed The skew, or non-symmetry, of the dependent NY values. a Both transformed The intercept for the relationship equation. b Both transformed The slope of the relationship equation. R Both transformed The correlation coefficient for N1 values. R2 Both transformed R-squared, coefficient of determination for N1 values. MeanY1est Dependent raw, transformed The mean for N1 values computed from the relationship (estimate the dependent values where values were previously known). SY1est Dependent raw, transformed The standard deviation for N1 values computed 203 FillRegression() Command TSTool Documentation Command Reference – FillRegression() -4 Statistic (Table Column Name) Involves Dependent, Independent, or Both Statistics Output in Raw or Transformed units Description from the relationship (estimate the dependent at locations where values are known). RMSE Dependent raw, transformed The “room mean squared error” for N1 overlapping values, which is a measure of the overall error of using the regression equation to estimate values, is calculated as: RMSE = ( ) 1 2 1 1 ' N Y Y i i Σ − where i Y1 is the original dependent value and ' 1i Y is the value estimated with the regression relationship. SEE Dependent raw, transformed The standard error of estimate for N1 overlapping values, which is a measure of the overall error of using the regression equation to estimate values, calculated as: SEE = ( ) 2 ' 1 2 1 1 −Σ − N Y Y i i where i Y1 is the original dependent value and ' 1i Y is the value estimated with the regression relationship. SEP Both raw The standard error of prediction for each estimated value, calculated as: S X X X X N S E P i i * ( ) 1 ( ) 1 2 1 12 1 1 1 Σ − − = + + where i X1 is the original independent value and 1 X is the mean of the N1 independent values. Note when using the mixed station analysis in the FillMixedStation() command, this value may be used to determine the relationship. The SEP is not actually output in the statistics table but may be added as an optional output time series in the future. SESlope Both N/A -unitless The standard error (SE) of the slope (b) for N1 overlapping values, calculated as: 204 TSTool Documentation FillRegression() Command Command Reference – FillRegression() -5 Statistic (Table Column Name) Involves Dependent, Independent, or Both Statistics Output in Raw or Transformed units Description ΣΣ − − − = 2 1 1 1 2 1 1 ( ) 2 ( ' ) X X N Y Y S E i i i where i X1 is the original independent value and 1 X is the mean of the N1 independent values; i Y1 is the original dependent value and ' 1i Y is the value estimated with the regression relationship. TestScore Both N/A -unitless b/SESlope Test Quantile Both N/A -unitless The value at which the confidence interval is satisfied. Comes from the Student’s T-test, which is a function of the confidence interval and degrees of freedom (DF), where DF is the degrees of freedom equal to N1 – 2 (corresponding to the intercept and the slope of the regression equation). Test OK Both N/A -unitless Will be No if TestScore >= TestQuantile, indicating that the b ≠ 0 data are related, and Yes if TestScore < TestQuantile, indicating that the data are not related. If the data are not related, then the relationship between the dependent and independent time series will not be used for filling. Sample SizeOK Both N/A – unitless Will be No if N1 < MinimumSampleSize and Yes if N1 >= MinimumSampleSize, indicating whether or not the number of overlapping points is greater than or equal to the number of overlapping points necessary. R OK Both N/A – unitless Will be No if R < MinimumR, indicating that the correlation is below the minimum threshold, and Yes if R >= MinimumR, indicating that the correlation is above the minimum threshold. NYfilled Dependent N/A – unitless The total number of missing points in the dependent time series that were filled through the regression. MeanY filled Dependent Raw The mean of the values that were used to fill missing points SYfilled Dependent Raw The standard deviation of the values that were used to fill missing points SkewY filled Dependent Raw The skew, or non-symmetry, of the values that were used to fill missing points Student’s T-distribution (http://en.wikipedia.org/wiki/Student's_t-distribution) is similar to a standard distribution, but has a higher probability of producing outliers. Using the Apache Math library 205 FillRegression() Command TSTool Documentation Command Reference – FillRegression() -6 (http://commons.apache.org/proper/commons-math/javadocs/api-3.2/index.html), the appropriate distribution for the size of the dataset is generated, and the value at which the desired confidence level is satisfied is calculated. For example, if the desired confidence level is .8 and the size of the dataset is seven, then following this graph of the Student’s T-distribution, values above approximately one would satisfy the confidence level. FillRegression_StudentTTest Student’s T-Test Example 206 TSTool Documentation FillRegression() Command Command Reference – FillRegression() -7 The following dialog is used to edit the command and illustrates the syntax of the command: FillRegression FillRegression() Command Editor The command syntax is as follows: FillRegression(Parameter=Value,…) 207 FillRegression() Command TSTool Documentation Command Reference – FillRegression() -8 Command Parameters Parameter Description Default TSID The time series identifier or alias for the time series to be filled. None – must be specified. Independent TSID The time series identifier or alias for the independent time series. None – must be specified. NumberOf Equations The number of equations to use for the analysis: OneEquation or MonthlyEquations. OneEquation AnalysisMonth Indicate the month to process when using monthly equations. Currently only a single month can be specified. Process all months. Transformation Indicates how to transform the data before analyzing. Specify as None (previously Linear) or Log (for Log10). If the Log option is used, zero and negative values are replaced with the value specified by the LEZeroLogValue parameter value for analysis (missing data values are ignored in the analysis). None (no transformation). LEZeroLogValue Value to use for data values less than or equal to zero when using a log transformation. The Log10 of this value will be used in calculations. .0010 Intercept Specify as 0 to force the intercept of the best-fit line through the origin (not available for log transformation). Parameter is optional and if specified the default is to not force the intercept through zero. AnalysisStart The date/time to start the analysis – use to focus on only a period appropriate from analysis. For example specify the unregulated period for streamflow. Analyze the full period. AnalysisEnd The date/time to end the analysis – use to focus on only a period appropriate from analysis. Analyze the full period. Minimum SampleSize The minimum number of overlapping values required to use a relationship for filling. 2, due to requirements in calculating the statistics MinimumR The minimum correlation coefficient required to use a relationship for filling. No check is performed. Confidence Interval A confidence interval in percent (e.g., 95) required for the slope of the relationship. The T-test is performed to ensure that the independent and dependent time series are related. The T-test is not performed to evaluate the confidence interval. Fill Indicate whether fill should occur (True) or just analyze to compute statistics (False). The latter is useful for testing combinations of fill parameters prior to actually performing filling. True FillStart The date/time to start filling, if other than the full time series period. Fill the full period. FillEnd The date/time to end filling, if other than the full time series period. Fill the full period. FillFlag A single character that will be used to flag filled data. Filled values will not be flagged. FillFlagDesc Description for the fill flag, used in reports. Automatically generated. 208 TSTool Documentation FillRegression() Command Command Reference – FillRegression() -9 Parameter Description Default TableID A table identifier for a table to receive output of the regression analysis (statistics are described above). Statistics are not written to the table. Refer to the log file for information. TableTSIDColumn The name of the column in the table that contains time series identifier information. This is used to match the table with time series being analyzed so that statistics can be written to the correct row. Required if TableID is specified. TableTSIDFormat The specifier used to format the time series identifier in the TableTSIDColumn. The location part of the TSID, or the time series alias is typically used. The alias will be used if available, or otherwise the full TSID will be used. SEPTSID The time series identifier of the SEP time series, calculated for ALL values in the analysis period. This parameter is not enabled but is envisioned to help evaluate filling and test FillMixedStation(). If not specified, no SEP time series will be generated. SEPTSAlias The alias to be assigned to the SEP time series. This parameter is not yet enabled. No alias is assigned to the SEP time series. FlagToWarn A parameter is envisioned to warn the user if any values in the time series are flagged with a specific flag value. This will allow checks to ensure that FillRegression() is not used with data that have been filled in a previous step. The command logic is as follows, with reference to command parameters that control the process: 1. The dependent (TSID) and independent time series (IndependentTSID) are retrieved using the time series identifiers or aliases. 2. Data arrays of overlapping non-missing values are extracted from time series to be used as the samples for analysis, as specified by command parameters (analysis period specified by AnalysisStart and AnalysisEnd; transformation specified by Transformation, LEZeroLogValue, and Intercept; number of equations specified by NumberOfEquations and AnalysisMonth). 3. The independent and dependent statistics and relationships are calculated, computing as many of the statistics as possible (some are skipped if the sample size results in division by zero). Computing the statistics allows them to be saved in the output table for review, and is controlled by the TableID, TableTSIDColumn, and TableTSIDFormat parameters. 4. The statistics are analyzed to determine if the relationships are acceptable for filling by checking the minimum sample size (MinimumSampleSize), minimum correlation coefficient (MinimumR), and that the relationship meets the confidence interval (ConfidenceInterval). If monthly equations are used, then it is possible that some months can be filled but not others. 5. If Fill=True (the default), then the relationships that are acceptable from step 4 are used to fill the dependent time series for the period specified by the FillStart and FillEnd parameters, with FillFlag and FillFlagDesc optionally being used to indicate filled values. 209 FillRegression() Command TSTool Documentation Command Reference – FillRegression() -10 This page is intentionally blank. 210 Command Reference: FillRepeat() Fill missing time series data by repeating known data values Version 09.09.00, 2010-09-23 The FillRepeat() command fills missing data in time series by repeating observations until another observation is found. This fill technique is useful, for example, where time series are likely to be stepwise or nearly constant, such as some reservoir and diversion time series. The following dialog is used to edit the command and illustrates the syntax of the command. FillRepeat FillRepeat() Command Editor 211 Command Reference – FillRepeat() -1 FillRepeat() Command TSTool Documentation The command syntax is as follows: FillRepeat(Parameter=Value,…) Command Parameters Parameter Description Default TSList Indicates the list of time series to be processed, one of: • AllMatchingTSID – all time series that match the TSID (single TSID or TSID with wildcards) will be modified. • AllTS – all time series before the command. • EnsembleID – all time series in the ensemble will be modified. • LastMatchingTSID – the last time series that matches the TSID (single TSID or TSID with wildcards) will be modified. • SelectedTS – the time series are those selected with the SelectTimeSeries() command. AllTS FillStart The starting date/time for the fill. Available period. FillEnd The ending date/time for the fill. Available period. FillDirection Specify the direction of the fill as Forward or Backward. Forward MaxIntervals The maximum number of intervals to fill in a data gap. Fill all gaps. Flag String to flag filled values. Prefix with + to append the string to existing flag values. Do not flag filled values. A sample command file to fill a time series from the State of Colorado’s HydroBase is as follows: # 08236500 -ALAMOSA RIVER BELOW TERRACE RESERVOIR 08236500.DWR.Streamflow.Month~HydroBase FillRepeat(TSList=AllMatchingTSID,TSID="08236500.DWR.St reamflow.Month", FillDirection=Forward) Command Reference – FillRepeat() -2 212 Command Reference: FillUsingDiversionComments() Fill missing time series data using HydroBase diversion comments and structure CIU information Version 09.07.02, 2010-08-20 This command is only appropriate for use with diversion (e.g., DivTotal, DivClass data types) and reservoir release (e.g., RelTotal, RelClass data types) time series for the HydroBase input type. The FillUsingDiversionComments() command fills missing data in time series by using diversion comment and structure “currently in use” (CIU) information in HydroBase. This information is used, for example, in cases where Water Commissioners have entered annual data values rather than daily or monthly records. Diversion Comment Not Used Flag HydroBase contains diversion comment data with a not_used field. If the not_used value matches one of the values shown in the following table for an irrigation year (November of the previous year to October of the irrigation year), the diversion (or reservoir release) data for the specified irrigation year can be interpreted as zero (see the State of Colorado’s Water Commissioner Manual for more information): Diversion Comment not_used Flag Resulting in Additional Zero Values not_used Meaning (reason why diversion is zero) A Structure is not usable B No water is available C Water available, but not taken D Water taken in another structure Structure Currently in Use Flag The HydroBase structure data contains a “currently in use” (CIU) field. Unlike diversion comments, this is a single value that is consistent with the current status of a structure (it is not a time series). The following CIU values are used. Structure CIU Flag Values and Meaning CIU Meaning A Active structure with contemporary diversion records B Structure abandoned by the court C Conditional structure D Duplicate; ID no longer used F Structure used as FROM number; located in another water district H Historical structure only-no longer exists or has records, but has historical data I Inactive structure which physically exists but no diversion records are kept N Non-existent structure with no contemporary or historical records U Active structure but diversion records are not maintained 21 3 Command Reference – FillUsingDiversionComments() -1 FillUsingDiversionComments() Command TSTool Documentation If UseCIU=True is specified for this command, the following logic will be used to fill missing time series values: 1. If the HydroBase CIU value is H or I for the structure associated with the time series: a. Fill using the diversion comments (see above for interpretation of comments). b. The limits of the time series are recomputed based on diversion data and comments. c. Missing data at the end of the period are filled with zeros, reflecting the fact that the structure is off-line. In this case, the limits are always recomputed, regardless of the value of the RecalcLimits command parameter. These values are not included in historical averages because they do not occur in the active life of the structure. d. Missing data within the data period remain missing, and can be filled with other commands such as fillHistMonthAverage(). e. Missing data prior to the first diversion values or comments remain missing, and can be filled with other commands as appropriate, perhaps specific to each location. 2. If in HydroBase CIU=N: a. Fill using the diversion comments (see above for interpretation of comments). b. The limits of the time series are recomputed based on diversion data and comments. c. Missing data at the beginning of the period are filled with zeros. In this case, the limits are always recomputed, regardless of the value of the RecalcLimits command parameter. d. The remaining missing data in the active data period or at the end of the period remain missing and can be filled with other commands. The output period for filled time series is handled as follows: • If a global output period has been specified (e.g., with the setOutputPeriod() command) then the time series will NOT be extended to include diversion comments and CIU codes beyond the output period. • If NO output period has been specified, the time series WILL be extended to include the longer period from diversion comments. CIU information does not cause the time series to be extended. After setting additional zero values using this command, the limits of the time series can be recomputed, if appropriate, for use with the fillHistMonthAverage() command (see the RecalcLimits=True parameter). If FillUsingCIU=true is specified, it overrides the RecalcLimits parameter as per the logic described above. See also the ReadHydroBase() and TS Alias = ReadHydroBase() commands, which allow filling with diversion comments after reading data. Refer to the HydroBase Input Type Appendix for more information about diversion time series. Command Reference – FillUsingDiversionComm ents() -2 214 TSTool Documentation FillUsingDiversionComments() Command The following dialog is used to edit the command and illustrates the syntax of the command. FillUsingDiversionComments FillUsingDiversionComm ents() Command Editor The command syntax is as follows: FillUsingDiversionComments(Parameter=Value,…) Command Parameters Parameter Description Default TSID The time series identifier or alias for the time series to be filled. Specify as * to fill all time series. None – must be specified. FillStart The starting date/time for the fill. Available period. FillEnd The ending date/time for the fill. Available period. 21 5 Command Reference – FillUsingDiversionComments() -3 FillUsingDiversionComments() Command TSTool Documentation Parameter Description Default FillFlag For each value that is filled using the diversion comment not_used information, tag the filled value as follows: • If FillFlag is specified as a single character, tag filled values with the specified character. • If FillFlag=Auto is specified, the diversion comment not_used value (A, B, C, or D) from HydroBase is used for the flag. The flag can then be used later to label graphs, etc. The flag will be appended to existing flags if necessary. No flag is assigned. FillUsingCIU Indicates whether the “currently in use” (CIU) information is used to fill missing data. This will result in additional zeros at the beginning or end of the time series, depending on CIU value. See the description of the logic above. Note that this will cause the time series data limits to be automatically recomputed, regardless of the value of the RecalcLimits parameter. False (CIU information is not used to fill missing data). FillUsingCIUFlag For each missing data value that is filled using the CIU information, tag the