pvary commented on code in PR #12824: URL: https://github.com/apache/iceberg/pull/12824#discussion_r2077967765
########## core/src/main/java/org/apache/iceberg/actions/BinPackRewriteFilePlanner.java: ########## @@ -199,30 +214,48 @@ protected long defaultTargetFileSize() { public FileRewritePlan<FileGroupInfo, FileScanTask, DataFile, RewriteFileGroup> plan() { StructLikeMap<List<List<FileScanTask>>> plan = planFileGroups(); RewriteExecutionContext ctx = new RewriteExecutionContext(); - Stream<RewriteFileGroup> groups = - plan.entrySet().stream() - .filter(e -> !e.getValue().isEmpty()) - .flatMap( - e -> { - StructLike partition = e.getKey(); - List<List<FileScanTask>> scanGroups = e.getValue(); - return scanGroups.stream() - .map( - tasks -> { - long inputSize = inputSize(tasks); - return newRewriteGroup( - ctx, - partition, - tasks, - inputSplitSize(inputSize), - expectedOutputFiles(inputSize)); - }); - }) - .sorted(RewriteFileGroup.comparator(rewriteJobOrder)); + List<RewriteFileGroup> selectedFileGroups = new ArrayList<>(); + AtomicInteger fileCountRunner = new AtomicInteger(); + plan.entrySet().stream() Review Comment: Makes sense. If not the NN is the issue, then it makes sense to limit the number of file ASAP -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org