Jibing-Li commented on code in PR #25175:
URL: https://github.com/apache/doris/pull/25175#discussion_r1353830896


##########
fe/fe-core/src/main/java/org/apache/doris/catalog/external/HMSExternalTable.java:
##########
@@ -635,6 +636,30 @@ public void gsonPostProcess() throws IOException {
         super.gsonPostProcess();
         estimatedRowCount = -1;
     }
+
+    @Override
+    public List<Long> getChunkSizes() {
+        HiveMetaStoreCache.HivePartitionValues partitionValues = 
StatisticsUtil.getPartitionValuesForTable(this);
+        List<HiveMetaStoreCache.FileCacheValue> filesByPartitions
+                = StatisticsUtil.getFilesForPartitions(this, partitionValues, 
0);
+        List<Long> result = Lists.newArrayList();
+        for (HiveMetaStoreCache.FileCacheValue files : filesByPartitions) {
+            for (HiveMetaStoreCache.HiveFileStatus file : files.getFiles()) {
+                result.add(file.getLength());
+            }
+        }
+        return result;
+    }
+
+    @Override
+    public long getDataSize(boolean singleReplica) {
+        List<Long> chunkSizes = getChunkSizes();

Review Comment:
   It is a heavy operation, as we discussed earlier, this brings a redundant 
fetching of all the files.
   Usually  we can get total size of hive table in hms, but this call is not 
only to get the total size, but the size of each file. We use it to calculate 
the accurate sample ratio.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to