RussellSpitzer commented on code in PR #12824:
URL: https://github.com/apache/iceberg/pull/12824#discussion_r2077932795


##########
core/src/main/java/org/apache/iceberg/actions/BinPackRewriteFilePlanner.java:
##########
@@ -199,30 +214,48 @@ protected long defaultTargetFileSize() {
   public FileRewritePlan<FileGroupInfo, FileScanTask, DataFile, 
RewriteFileGroup> plan() {
     StructLikeMap<List<List<FileScanTask>>> plan = planFileGroups();
     RewriteExecutionContext ctx = new RewriteExecutionContext();
-    Stream<RewriteFileGroup> groups =
-        plan.entrySet().stream()
-            .filter(e -> !e.getValue().isEmpty())
-            .flatMap(
-                e -> {
-                  StructLike partition = e.getKey();
-                  List<List<FileScanTask>> scanGroups = e.getValue();
-                  return scanGroups.stream()
-                      .map(
-                          tasks -> {
-                            long inputSize = inputSize(tasks);
-                            return newRewriteGroup(
-                                ctx,
-                                partition,
-                                tasks,
-                                inputSplitSize(inputSize),
-                                expectedOutputFiles(inputSize));
-                          });
-                })
-            .sorted(RewriteFileGroup.comparator(rewriteJobOrder));
+    List<RewriteFileGroup> selectedFileGroups = new ArrayList<>();
+    AtomicInteger fileCountRunner = new AtomicInteger();
+    plan.entrySet().stream()

Review Comment:
   I just changed my mind based on a Slack discussions yesterday. A user was 
hitting driver side OOMs and since we are aggregating things here in memory, it 
may be better to trim the amount of entities we are going to create as soon as 
we can. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to