akshayrai commented on a change in pull request #5740: URL: https://github.com/apache/incubator-pinot/pull/5740#discussion_r461792659
########## File path: thirdeye/thirdeye-frontend/app/utils/yaml-tools.js ########## @@ -29,6 +28,20 @@ rules: type: DATA_SLA # Alert if data is missing. params: sla: 3_DAYS # Data is missing for 3 days since last availability + +# You can mention a cron to allow this detection to be run periodically +# For example, the cron below would execute every 5 minutes. +# cron: "0 0/5 * 1/1 * ? *" + +# Detections in past data +# You can use the lastTimestamp parameter to denote how far back you want the anomalies +# to be detected. This value acts like a checkpoint or high watermark. +# A valid entry is a non negative timestamp value in millis. +# +# - By default, (if not mentioned), the lookback is 30 days +# - A value of 0 (zero) implies complete backfill till the start of data. As shown below. Review comment: Couple of concerns with backfilling all the data: 1. Backfilling all the data can take quite some time, especially if there are years worth of data and possibly even fail due to timeout and other issues. 2. There isn't much benefit of backfilling older data. Users usually don't care about historical anomalies/issues, at least not older than a month. 3. The goal of the 30 day default backfill is to quickly show some stats and recent anomalies as soon as the alert is created. Otherwise, the alert page will look empty and users have to revisit after some time. 4. Typically, users tend to create an alert using the basic settings and update it later on. When they update, anomalies created using the initial setting will no longer make sense. We would increase the default limit from 30 days to say 90 days if that is more reasonable? ########## File path: thirdeye/thirdeye-frontend/app/utils/yaml-tools.js ########## @@ -29,6 +28,20 @@ rules: type: DATA_SLA # Alert if data is missing. params: sla: 3_DAYS # Data is missing for 3 days since last availability + +# You can mention a cron to allow this detection to be run periodically +# For example, the cron below would execute every 5 minutes. +# cron: "0 0/5 * 1/1 * ? *" + +# Detections in past data +# You can use the lastTimestamp parameter to denote how far back you want the anomalies +# to be detected. This value acts like a checkpoint or high watermark. +# A valid entry is a non negative timestamp value in millis. +# +# - By default, (if not mentioned), the lookback is 30 days +# - A value of 0 (zero) implies complete backfill till the start of data. As shown below. Review comment: Couple of concerns with backfilling all the data: 1. Backfilling all the data can take quite some time, especially if there are years worth of data and possibly even fail due to timeout and other issues. 2. There isn't much benefit of backfilling older data. Users usually don't care about historical anomalies/issues, at least not older than a month. 3. The goal of the 30 day default backfill is to quickly show some stats and recent anomalies as soon as the alert is created. Otherwise, the alert page will look empty and users have to revisit after some time. 4. Typically, users tend to create an alert using the basic settings and update it later on. When they update, anomalies created using the initial setting will no longer make sense. We could increase the default limit from 30 days to say 90 days if that is more reasonable? ########## File path: thirdeye/thirdeye-pinot/src/main/java/org/apache/pinot/thirdeye/detection/yaml/YamlResource.java ########## @@ -236,10 +236,17 @@ private void createYamlOnboardingTask(long configId, long tuningWindowStart, lon info.setTuningWindowStart(tuningWindowStart); info.setTuningWindowEnd(tuningWindowEnd); info.setEnd(System.currentTimeMillis()); - info.setStart(info.getEnd() - ONBOARDING_REPLAY_LOOKBACK); + + long lastTimestamp = detectionConfig.getLastTimestamp(); + // If no value is present, set the default lookback + if (lastTimestamp < 0) { Review comment: Since the default value of lastTimestamp will be 0 (when not set), we should probably change this condition here to not trigger detection on the entire dataset. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org