englefly commented on code in PR #25079:
URL: https://github.com/apache/doris/pull/25079#discussion_r1349510034


##########
fe/fe-core/src/main/java/org/apache/doris/statistics/OlapAnalysisTask.java:
##########
@@ -65,7 +71,25 @@ public class OlapAnalysisTask extends BaseAnalysisTask {
                     + "MIN(`${colName}`) AS min, "
                     + "MAX(`${colName}`) AS max, "
                     + "${dataSizeFunction} AS data_size, "
-                    + "NOW() FROM `${dbName}`.`${tblName}` PARTITION 
${partitionName}  ${sampleExpr}";
+                    + "NOW() FROM `${dbName}`.`${tblName}` PARTITION 
${partitionName}";
+
+    private static final String SAMPLE_COLUMN_SQL_TEMPLATE = "SELECT \n"
+            + "CONCAT(${tblId}, '-', ${idxId}, '-', '${colId}') AS id, \n"
+            + "${catalogId} AS catalog_id, \n"
+            + "${dbId} AS db_id, \n"
+            + "${tblId} AS tbl_id, \n"
+            + "${idxId} AS idx_id, \n"
+            + "'${colId}' AS col_id, \n"
+            + "NULL AS part_id, \n"
+            + "COUNT(1) * ${ratio} AS row_count, \n"
+            + "NDV(`${colName}`) * ${ratio}  AS ndv, \n"
+            + "SUM(CASE WHEN `${colName}` IS NULL THEN 1 ELSE 0 END) * 
${ratio} AS null_count, \n"

Review Comment:
   How about rename "ratio" by "scale", like "sum(...) * ${scale}"? This double 
always larger than 1.



##########
fe/fe-core/src/main/java/org/apache/doris/statistics/BaseAnalysisTask.java:
##########
@@ -222,23 +226,23 @@ protected String getDataSizeFunction(Column column) {
         return "COUNT(1) * " + column.getType().getSlotSize();
     }
 
-    protected String getSampleExpression() {
+    protected TableSample getTableSample() {
         if (info.forceFull) {
-            return "";
+            return null;
         }
-        int sampleRows = info.sampleRows;
+        long sampleRows = info.sampleRows;
         if (info.analysisMethod == AnalysisMethod.FULL) {
             if (Config.enable_auto_sample
                     && tbl.getDataSize(true) > 
Config.huge_table_lower_bound_size_in_bytes) {
                 sampleRows = Config.huge_table_default_sample_rows;
             } else {
-                return "";
+                return null;
             }
         }
         if (info.samplePercent > 0) {
-            return String.format("TABLESAMPLE(%d PERCENT)", 
info.samplePercent);
+            return new TableSample(true, (long) info.samplePercent);

Review Comment:
   1. samplePercent has higher priority than that of sampleRows. Do we have 
document about this?
   2. It would be better if we could convert percentage to sampleRows here. for 
example, sample 5% means sampleRows = table_rows * 5%



##########
fe/fe-core/src/main/java/org/apache/doris/statistics/ColStatsData.java:
##########
@@ -54,12 +54,12 @@ public class ColStatsData {
 
     public ColStatsData(ResultRow row) {
         this.statsId = new StatsId(row);
-        this.count = Long.parseLong(row.get(7));
-        this.ndv = Long.parseLong(row.getWithDefault(8, "0"));
-        this.nullCount = Long.parseLong(row.getWithDefault(9, "0"));
+        this.count = (long) Double.parseDouble(row.get(7));

Review Comment:
   why convert count from Double?



##########
fe/fe-core/src/main/java/org/apache/doris/statistics/HMSAnalysisTask.java:
##########
@@ -65,7 +65,7 @@ public class HMSAnalysisTask extends BaseAnalysisTask {
             + "MAX(`${colName}`) AS max, "
             + "${dataSizeFunction} AS data_size, "
             + "NOW() "
-            + "FROM `${catalogName}`.`${dbName}`.`${tblName}` ${sampleExpr}";
+            + "FROM `${catalogName}`.`${dbName}`.`${tblName}`";

Review Comment:
   it seems that you changed grammar, why there is no change on .g4 files?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to