Jibing-Li commented on code in PR #25175: URL: https://github.com/apache/doris/pull/25175#discussion_r1353830896
########## fe/fe-core/src/main/java/org/apache/doris/catalog/external/HMSExternalTable.java: ########## @@ -635,6 +636,30 @@ public void gsonPostProcess() throws IOException { super.gsonPostProcess(); estimatedRowCount = -1; } + + @Override + public List<Long> getChunkSizes() { + HiveMetaStoreCache.HivePartitionValues partitionValues = StatisticsUtil.getPartitionValuesForTable(this); + List<HiveMetaStoreCache.FileCacheValue> filesByPartitions + = StatisticsUtil.getFilesForPartitions(this, partitionValues, 0); + List<Long> result = Lists.newArrayList(); + for (HiveMetaStoreCache.FileCacheValue files : filesByPartitions) { + for (HiveMetaStoreCache.HiveFileStatus file : files.getFiles()) { + result.add(file.getLength()); + } + } + return result; + } + + @Override + public long getDataSize(boolean singleReplica) { + List<Long> chunkSizes = getChunkSizes(); Review Comment: It is a heavy operation, as we discussed earlier, this brings a redundant fetching of all the files. Usually we can get total size of hive table in hms, but this call is not only to get the total size, but the size of each file. We use it to calculate the accurate sample ratio. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org