vincentchenjl commented on a change in pull request #5866:
URL: https://github.com/apache/incubator-pinot/pull/5866#discussion_r472489939



##########
File path: 
thirdeye/thirdeye-pinot/src/main/java/org/apache/pinot/thirdeye/api/detection/AnomalyDetectionResource.java
##########
@@ -345,18 +323,15 @@ DatasetConfigDTO generateDatasetConfig(JsonNode 
payloadNode, String suffix) {
     return datasetConfigDTO;
   }
 
-  MetricConfigDTO generateMetricConfig(JsonNode payloadNode, String suffix,
-        DatasetConfigDTO datasetConfigDTO)
-      throws JsonProcessingException {
+  MetricConfigDTO generateMetricConfig(JsonNode payloadNode, String suffix) {
     MetricConfigDTO metricConfigDTO = new MetricConfigDTO();
-    JsonNode dataNode = payloadNode.get(DATA_FIELD);
 
     // Default configuration
-    metricConfigDTO.setName(DEFAULT_METRIC_NAME + suffix);
+    metricConfigDTO.setName(DEFAULT_METRIC_NAME);

Review comment:
       Any reason we need to hardcode this here?

##########
File path: 
thirdeye/thirdeye-pinot/src/main/java/org/apache/pinot/thirdeye/api/detection/AnomalyDetectionResource.java
##########
@@ -511,24 +499,19 @@ private TaskDTO pollingTask(long taskId) {
   }
 
   private void cleanStates(MetricConfigDTO metricConfigDTO, DatasetConfigDTO 
datasetConfigDTO) {
+    // Clean up ad hoc data
     if (datasetConfigDTO != null) {
-      datasetConfigDAO.delete(datasetConfigDTO);
-      LOG.info("Deleted dataset: {}", datasetConfigDTO);
-
-      int anomalyCnt = anomalyDAO.deleteByPredicate(
-          Predicate.EQ("collection", datasetConfigDTO.getName()));
-      LOG.info("Deleted {} anomalies with dataset {}",
-          anomalyCnt, datasetConfigDTO.getName());
+      int onlineDetectionDataCnt = onlineDetectionDataDAO
+          .deleteByPredicate(Predicate.EQ("dataset", 
datasetConfigDTO.getName()));

Review comment:
       If we store the ad-hoc data, do we still want store the metadata in 
normalized form? Since we clean up the dataset every time, we can store all the 
metadata into the ad hoc data table.

##########
File path: 
thirdeye/thirdeye-pinot/src/main/java/org/apache/pinot/thirdeye/api/detection/AnomalyDetectionResource.java
##########
@@ -464,6 +420,38 @@ TaskDTO generateTaskConfig(DetectionConfigDTO 
detectionConfigDTO,
     return taskDTO;
   }
 
+  long saveOnlineDetectionData(JsonNode payloadNode,
+      DatasetConfigDTO datasetConfigDTO, MetricConfigDTO metricConfigDTO)
+        throws JsonProcessingException {
+    JsonNode dataNode = payloadNode.get(DATA_FIELD);
+    String timeColumnName = datasetConfigDTO.getTimeColumn();
+    String datasetName = datasetConfigDTO.getDataset();
+    String metricName = metricConfigDTO.getName();
+
+    // Check if time & metric columns exist in adhoc data
+    ArrayNode columnsNode = dataNode.withArray(COLUMNS_FIELD);
+    int[] colIndices = findTimeAndMetricColumns(columnsNode,
+        timeColumnName, metricName);
+    int timeColIdx = colIndices[0];
+    int metricColIdx = colIndices[1];
+    Preconditions.checkArgument(metricColIdx>=0 && timeColIdx>=0,
+        String.format("metric: %s or time: %s not found in adhoc data.",
+            metricName, timeColumnName));
+
+    // Save online data
+    OnlineDetectionDataDTO onlineDetectionDataDTO = new 
OnlineDetectionDataDTO();
+    onlineDetectionDataDTO.setDataset(datasetName);
+    onlineDetectionDataDTO.setMetric(metricName);
+    
onlineDetectionDataDTO.setOnlineDetectionData(this.objectMapper.writeValueAsString(dataNode));

Review comment:
       Since we are storing the time series data in JSON, we might exceed the 
limit of MySQL text field, which is 64KB. We should at least make sure that the 
field should not be overflowed the large data users posted.

##########
File path: thirdeye/thirdeye-pinot/src/main/resources/schema/create-schema.sql
##########
@@ -437,3 +437,15 @@ create index rootcause_template_id_idx ON 
rootcause_template_index(base_id);
 create index rootcause_template_owner_idx ON rootcause_template_index(owner);
 create index rootcause_template_metric_idx on 
rootcause_template_index(metric_id);
 create index rootcause_template_config_application_idx ON 
rootcause_template_index(`application`);
+
+create table if not exists online_detection_data_index (
+    base_id bigint(20) not null,
+    dataset varchar(200),
+    metric varchar(200),
+    create_time timestamp default 0,
+    update_time timestamp default current_timestamp,
+    version int(10)
+) ENGINE=InnoDB;

Review comment:
       For long term, we might want to consider using BLOB/CLOB to storing ad 
hoc data so that we can support dataset larger than 64KB, not to mention that 
the data is stored in JSON format, which is really inefficient ways to store 
data, compare to any binary format. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to