advancedxy commented on code in PR #8346:
URL: https://github.com/apache/iceberg/pull/8346#discussion_r1299213591


##########
core/src/main/java/org/apache/iceberg/BaseFileScanTask.java:
##########
@@ -45,31 +50,67 @@ protected FileScanTask self() {
 
   @Override
   protected FileScanTask newSplitTask(FileScanTask parentTask, long offset, 
long length) {
-    return new SplitScanTask(offset, length, parentTask);
+    return new SplitScanTask(offset, length, deletesSizeBytes(), parentTask);
   }
 
   @Override
   public List<DeleteFile> deletes() {
-    return ImmutableList.copyOf(deletes);
+    if (deletesAsList == null) {
+      this.deletesAsList = 
Collections.unmodifiableList(Arrays.asList(deletes));
+    }
+
+    return deletesAsList;
+  }
+
+  @Override
+  public long sizeBytes() {
+    return length() + deletesSizeBytes();
+  }
+
+  @Override
+  public int filesCount() {
+    return 1 + deletes.length;

Review Comment:
   Nit: If we were adding more methods to the parent class, how can we make 
sure new methods are override in this method? Otherwise, it would probably 
accidentally materializing `deletesAsList`?
   
   I don't think the above question is a blocker, and It would be great if we 
have some way/tests to detect that.



##########
core/src/main/java/org/apache/iceberg/BaseFileScanTask.java:
##########
@@ -45,31 +49,67 @@ protected FileScanTask self() {
 
   @Override
   protected FileScanTask newSplitTask(FileScanTask parentTask, long offset, 
long length) {
-    return new SplitScanTask(offset, length, parentTask);
+    return new SplitScanTask(offset, length, deletesSizeBytes(), parentTask);
   }
 
   @Override
   public List<DeleteFile> deletes() {
-    return ImmutableList.copyOf(deletes);
+    if (deletesAsList == null) {
+      this.deletesAsList = ImmutableList.copyOf(deletes);
+    }
+
+    return deletesAsList;
+  }
+
+  @Override
+  public long sizeBytes() {
+    return length() + deletesSizeBytes();
+  }
+
+  @Override
+  public int filesCount() {
+    return 1 + deletes.length;
   }
 
   @Override
   public Schema schema() {
     return super.schema();
   }
 
+  private long deletesSizeBytes() {
+    if (deletesSizeBytes == null) {

Review Comment:
   8 (size of long) * 1_000_000(1 million) = ~8MB, I wouldn't care too much 
about this especially the tasks are serialized to multiple executors in 
multiple rounds(in Spark query engine).
   
   However it do add unnecessary overhead for ScanTask without delete files. So 
a transient long and lazy calculation would be nice.



##########
core/src/main/java/org/apache/iceberg/BaseFileScanTask.java:
##########
@@ -28,6 +28,10 @@ public class BaseFileScanTask extends 
BaseContentScanTask<FileScanTask, DataFile
     implements FileScanTask {
   private final DeleteFile[] deletes;
 
+  // lazy variables
+  private transient volatile List<DeleteFile> deletesAsList = null;
+  private transient volatile Long deletesSizeBytes = null;

Review Comment:
   Thanks for detail explanation.
   
   On a second thought, how about declare it as a normal transient long, such 
as:
   
   ```java
   private transient volatile long deletesSizeBytes = 0;
   
   private long deletesSizeBytes() {
       if (deletesSizeBytes == 0) { // the deletesSizeBytes might not 
initialized yet.
         long size = 0L;
         for (DeleteFile deleteFile : deletes) {
           size += deleteFile.fileSizeInBytes();
         }
         this.deletesSizeBytes = size;
       }
   
       return deletesSizeBytes;
   }
   ```
   
   We just need to pay a small addition check for no delete file cases: which 
is iterating an empty array.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to