akshayrai commented on a change in pull request #5740:
URL: https://github.com/apache/incubator-pinot/pull/5740#discussion_r461792659



##########
File path: thirdeye/thirdeye-frontend/app/utils/yaml-tools.js
##########
@@ -29,6 +28,20 @@ rules:
     type: DATA_SLA              # Alert if data is missing.
     params:
       sla: 3_DAYS               # Data is missing for 3 days since last 
availability
+
+# You can mention a cron to allow this detection to be run periodically
+# For example, the cron below would execute every 5 minutes.
+# cron: "0 0/5 * 1/1 * ? *"
+
+# Detections in past data
+# You can use the lastTimestamp parameter to denote how far back you want the 
anomalies
+# to be detected. This value acts like a checkpoint or high watermark. 
+# A valid entry is a non negative timestamp value in millis.
+# 
+# - By default, (if not mentioned), the lookback is 30 days
+# - A value of 0 (zero) implies complete backfill till the start of data. As 
shown below.

Review comment:
       Couple of concerns with backfilling all the data:
   1. Backfilling all the data can take quite some time, especially if there 
are years worth of data and possibly even fail due to timeout and other issues.
   2. There isn't much benefit of backfilling older data. Users usually don't 
care about historical anomalies/issues, at least not older than a month.
   3. The goal of the 30 day default backfill is to quickly show some stats and 
recent anomalies as soon as the alert is created. Otherwise, the alert page 
will look empty and users have to revisit after some time.
   4. Typically, users tend to create an alert using the basic settings and 
update it later on. When they update,  anomalies created using the initial 
setting will no longer make sense.
   
   We would increase the default limit from 30 days to say 90 days if that is 
more reasonable?

##########
File path: thirdeye/thirdeye-frontend/app/utils/yaml-tools.js
##########
@@ -29,6 +28,20 @@ rules:
     type: DATA_SLA              # Alert if data is missing.
     params:
       sla: 3_DAYS               # Data is missing for 3 days since last 
availability
+
+# You can mention a cron to allow this detection to be run periodically
+# For example, the cron below would execute every 5 minutes.
+# cron: "0 0/5 * 1/1 * ? *"
+
+# Detections in past data
+# You can use the lastTimestamp parameter to denote how far back you want the 
anomalies
+# to be detected. This value acts like a checkpoint or high watermark. 
+# A valid entry is a non negative timestamp value in millis.
+# 
+# - By default, (if not mentioned), the lookback is 30 days
+# - A value of 0 (zero) implies complete backfill till the start of data. As 
shown below.

Review comment:
       Couple of concerns with backfilling all the data:
   1. Backfilling all the data can take quite some time, especially if there 
are years worth of data and possibly even fail due to timeout and other issues.
   2. There isn't much benefit of backfilling older data. Users usually don't 
care about historical anomalies/issues, at least not older than a month.
   3. The goal of the 30 day default backfill is to quickly show some stats and 
recent anomalies as soon as the alert is created. Otherwise, the alert page 
will look empty and users have to revisit after some time.
   4. Typically, users tend to create an alert using the basic settings and 
update it later on. When they update,  anomalies created using the initial 
setting will no longer make sense.
   
   We could increase the default limit from 30 days to say 90 days if that is 
more reasonable?

##########
File path: 
thirdeye/thirdeye-pinot/src/main/java/org/apache/pinot/thirdeye/detection/yaml/YamlResource.java
##########
@@ -236,10 +236,17 @@ private void createYamlOnboardingTask(long configId, long 
tuningWindowStart, lon
     info.setTuningWindowStart(tuningWindowStart);
     info.setTuningWindowEnd(tuningWindowEnd);
     info.setEnd(System.currentTimeMillis());
-    info.setStart(info.getEnd() - ONBOARDING_REPLAY_LOOKBACK);
+
+    long lastTimestamp = detectionConfig.getLastTimestamp();
+    // If no value is present, set the default lookback
+    if (lastTimestamp < 0) {

Review comment:
       Since the default value of lastTimestamp will be 0 (when not set), we 
should probably change this condition here to not trigger detection on the 
entire dataset.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to