[GitHub] [doris] weizhengte commented on a diff in pull request #12765: [feature-wip](statistics) collect statistics by sql task

GitBox Wed, 21 Sep 2022 17:51:15 -0700


weizhengte commented on code in PR #12765:
URL: https://github.com/apache/doris/pull/12765#discussion_r977089097



##########
fe/fe-core/src/main/java/org/apache/doris/statistics/SQLStatisticsTask.java:
##########
@@ -17,47 +17,119 @@
 
 package org.apache.doris.statistics;
 
-import org.apache.doris.analysis.SelectStmt;
+import org.apache.doris.catalog.Database;
+import org.apache.doris.catalog.Env;
+import org.apache.doris.catalog.Table;
+import org.apache.doris.common.DdlException;
+import org.apache.doris.common.InvalidFormatException;
+import org.apache.doris.statistics.StatisticsTaskResult.TaskResult;
+import org.apache.doris.statistics.StatsGranularity.Granularity;
+import org.apache.doris.statistics.util.InternalQuery;
+import org.apache.doris.statistics.util.InternalQueryResult;
+import org.apache.doris.statistics.util.InternalQueryResult.ResultRow;
+import org.apache.doris.statistics.util.InternalSqlTemplate;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
 
 import java.util.List;
+import java.util.Map;
 
 /**
  * A statistics task that collects statistics by executing query.
  * The results of the query will be returned as @StatisticsTaskResult.
  */
 public class SQLStatisticsTask extends StatisticsTask {
-    private SelectStmt query;
+    private String statement;
 
     public SQLStatisticsTask(long jobId, List<StatisticsDesc> statsDescs) {
         super(jobId, statsDescs);
     }
 
     @Override
     public StatisticsTaskResult call() throws Exception {
-        // TODO
-        // step1: construct query by statsDescList
-        constructQuery();
-        // step2: execute query
-        // the result should be sequence by @statsTypeList
-        List<String> queryResultList = executeQuery(query);
-        // step3: construct StatisticsTaskResult by query result
-        constructTaskResult(queryResultList);
-        return null;
+        checkStatisticsDesc();
+        List<TaskResult> taskResults = Lists.newArrayList();
+
+        for (StatisticsDesc statsDesc : statsDescs) {
+            statement = constructQuery(statsDesc);
+            TaskResult taskResult = executeQuery(statsDesc);
+            taskResults.add(taskResult);
+        }
+
+        return new StatisticsTaskResult(taskResults);
     }
 
-    protected void constructQuery() {
-        // TODO
-        // step1: construct FROM by @granularityDesc
-        // step2: construct SELECT LIST by @statsTypeList
+    protected String constructQuery(StatisticsDesc statsDesc) throws 
DdlException,
+            InvalidFormatException {
+        Map<String, String> params = getQueryParams(statsDesc);
+        List<StatsType> statsTypes = statsDesc.getStatsTypes();
+        StatsType type = statsTypes.get(0);
+
+        StatsGranularity statsGranularity = statsDesc.getStatsGranularity();
+        Granularity granularity = statsGranularity.getGranularity();
+        boolean nonPartitioned = granularity != Granularity.PARTITION;
+
+        switch (type) {

Review Comment:
   I think both ways can work. through testing, we found that when a single SQL 
collects multiple metrics, its query time  increases proportionally with the 
number of metrics. For NDV, if the number of rows is 100 million, it takes 3s. 
When there is a large amount of data and a lot of columns, it takes longer. The 
current strategy is collecting statistics in parallel. Test data from 
@EmmyMiao87 
   
docs：https://docs.google.com/document/d/1u1L6XhyzKShoyYRwFQ6kE1rnvY2iFwauwg289au5Qq0/edit



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

[GitHub] [doris] weizhengte commented on a diff in pull request #12765: [feature-wip](statistics) collect statistics by sql task

Reply via email to