Re: [PR] [improvement](statistics)Return -1 when external table row count is unknown. [doris]

via GitHub Wed, 07 Aug 2024 00:12:41 -0700


Jibing-Li commented on code in PR #38990:
URL: https://github.com/apache/doris/pull/38990#discussion_r1706485695



##########
fe/fe-core/src/main/java/org/apache/doris/datasource/paimon/PaimonExternalTable.java:
##########
@@ -187,22 +187,17 @@ public BaseAnalysisTask createAnalysisTask(AnalysisInfo 
info) {
     @Override
     public long fetchRowCount() {
         makeSureInitialized();
-        try {
-            long rowCount = 0;
-            Optional<SchemaCacheValue> schemaCacheValue = 
getSchemaCacheValue();
-            Table paimonTable = schemaCacheValue.map(value -> 
((PaimonSchemaCacheValue) value).getPaimonTable())
-                    .orElse(null);
-            if (paimonTable == null) {
-                return -1;
-            }
-            List<Split> splits = 
paimonTable.newReadBuilder().newScan().plan().splits();
-            for (Split split : splits) {
-                rowCount += split.rowCount();
-            }
-            return rowCount;
-        } catch (Exception e) {
-            LOG.warn("Fail to collect row count for db {} table {}", dbName, 
name, e);
+        long rowCount = 0;
+        Optional<SchemaCacheValue> schemaCacheValue = getSchemaCacheValue();
+        Table paimonTable = schemaCacheValue.map(value -> 
((PaimonSchemaCacheValue) value).getPaimonTable())
+                .orElse(null);
+        if (paimonTable == null) {
+            return -1;
+        }
+        List<Split> splits = 
paimonTable.newReadBuilder().newScan().plan().splits();

Review Comment:
   This could be expensive when the table is large. But it's not related to 
this pr, we can try to improve this in a separate pr if needed.



##########
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/IcebergUtils.java:
##########
@@ -592,22 +592,17 @@ public static List<Column> getSchema(ExternalCatalog 
catalog, String dbName, Str
      * @return estimated row count
      */
     public static long getIcebergRowCount(ExternalCatalog catalog, String 
dbName, String tbName) {
-        try {
-            Table icebergTable = Env.getCurrentEnv()
-                    .getExtMetaCacheMgr()
-                    .getIcebergMetadataCache()
-                    .getIcebergTable(catalog, dbName, tbName);
-            Snapshot snapshot = icebergTable.currentSnapshot();
-            if (snapshot == null) {
-                // empty table
-                return 0;
-            }
-            Map<String, String> summary = snapshot.summary();
-            return Long.parseLong(summary.get(TOTAL_RECORDS)) - 
Long.parseLong(summary.get(TOTAL_POSITION_DELETES));
-        } catch (Exception e) {
-            LOG.warn("Fail to collect row count for db {} table {}", dbName, 
tbName, e);
+        Table icebergTable = Env.getCurrentEnv()
+                .getExtMetaCacheMgr()
+                .getIcebergMetadataCache()
+                .getIcebergTable(catalog, dbName, tbName);
+        Snapshot snapshot = icebergTable.currentSnapshot();

Review Comment:
   I checked the code, the table may be null when the iceberg metadata cache is 
not loaded. But I think it's not a problem, because the NPE would be caught in 
the caller and return the default value -1. Meanwhile it will trigger iceberg 
metadata cache to load the table so we can get it next time. I think we can fix 
this in a separate pr if needed. But I feel we don't need to do anything about 
it right now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Re: [PR] [improvement](statistics)Return -1 when external table row count is unknown. [doris]

Reply via email to