[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6661: Core: Support delete file stats in partitions metadata table

via GitHub Fri, 10 Mar 2023 16:03:30 -0800


szehon-ho commented on code in PR #6661:
URL: https://github.com/apache/iceberg/pull/6661#discussion_r1132981630



##########
core/src/main/java/org/apache/iceberg/PartitionsTable.java:
##########
@@ -220,21 +257,53 @@ Iterable<Partition> all() {
 
   static class Partition {
     private final StructLike key;
-    private long recordCount;
-    private int fileCount;
     private int specId;
+    private long dataRecordCount;
+    private int dataFileCount;
+
+    private final Set<DeleteFile> equalityDeleteFiles;
+    private final Set<DeleteFile> positionDeleteFiles;
 
     Partition(StructLike key) {
       this.key = key;
-      this.recordCount = 0;
-      this.fileCount = 0;
       this.specId = 0;
+      this.dataRecordCount = 0;
+      this.dataFileCount = 0;
+      this.positionDeleteFiles = Sets.newHashSet();
+      this.equalityDeleteFiles = Sets.newHashSet();
+    }
+
+    private void update(FileScanTask task) {

Review Comment:
   Actually that way I thought will be quite expensive (two pass).  
   
   Probably the only way to effectively do it , until this whole table is 
migrated over to some kind of view of 'files' table, is to rewrite the 
PartitionsTableScan to directly use the underlying code:  
ManifestReader.readDeleteManifest() / ManifestReader.read(), and then go 
through those iterators, instead of using the ManifestGroup.planFiles() / 
FileScanTask way.
   
   That way, we can iterate through the DataFile/ DeleteFile, and collect 
delete files/data files in one pass without keeping them in memory.  It's 
definitely do-able but will be a bit more work though.  Any thoughts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6661: Core: Support delete file stats in partitions metadata table

Reply via email to