coderfender commented on code in PR #12824: URL: https://github.com/apache/iceberg/pull/12824#discussion_r2082637481
########## core/src/main/java/org/apache/iceberg/actions/BinPackRewriteFilePlanner.java: ########## @@ -199,30 +214,48 @@ protected long defaultTargetFileSize() { public FileRewritePlan<FileGroupInfo, FileScanTask, DataFile, RewriteFileGroup> plan() { StructLikeMap<List<List<FileScanTask>>> plan = planFileGroups(); RewriteExecutionContext ctx = new RewriteExecutionContext(); - Stream<RewriteFileGroup> groups = - plan.entrySet().stream() - .filter(e -> !e.getValue().isEmpty()) - .flatMap( - e -> { - StructLike partition = e.getKey(); - List<List<FileScanTask>> scanGroups = e.getValue(); - return scanGroups.stream() - .map( - tasks -> { - long inputSize = inputSize(tasks); - return newRewriteGroup( - ctx, - partition, - tasks, - inputSplitSize(inputSize), - expectedOutputFiles(inputSize)); - }); - }) - .sorted(RewriteFileGroup.comparator(rewriteJobOrder)); + List<RewriteFileGroup> selectedFileGroups = new ArrayList<>(); + AtomicInteger fileCountRunner = new AtomicInteger(); + plan.entrySet().stream() Review Comment: @pvary , I pushed a commit which moved the pruning logic rught after we get fileScanTasks from the scan API . The one good thing with this is that the implementation is easier than the above approach and the other thing to note is the file scan tasks getting pruned are always guaranteed to be random (since we are pruning before grouping the partitions) . Let me know if you think this is a clearer approach than the previous one or other wise -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org