findepi commented on code in PR #10691:
URL: https://github.com/apache/iceberg/pull/10691#discussion_r1686408402


##########
core/src/main/java/org/apache/iceberg/util/ParallelIterable.java:
##########
@@ -20,84 +20,117 @@
 
 import java.io.Closeable;
 import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.util.ArrayDeque;
+import java.util.Deque;
 import java.util.Iterator;
 import java.util.NoSuchElementException;
+import java.util.Optional;
+import java.util.concurrent.CompletableFuture;
 import java.util.concurrent.ConcurrentLinkedQueue;
 import java.util.concurrent.ExecutionException;
 import java.util.concurrent.ExecutorService;
-import java.util.concurrent.Future;
-import org.apache.iceberg.exceptions.RuntimeIOException;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.function.Supplier;
 import org.apache.iceberg.io.CloseableGroup;
 import org.apache.iceberg.io.CloseableIterable;
 import org.apache.iceberg.io.CloseableIterator;
 import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
 import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.io.Closer;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 public class ParallelIterable<T> extends CloseableGroup implements 
CloseableIterable<T> {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(ParallelIterable.class);
+
+  // Logic behind default value: ParallelIterable is often used for file 
planning.
+  // Assuming that a DataFile or DeleteFile is about 500 bytes, a 30k limit 
uses 14.3 MB of memory.
+  private static final int DEFAULT_MAX_QUEUE_SIZE = 30_000;

Review Comment:
   Here we should use "the lowest number possible that doesn't affect 
performance negatively", rather than "the highest number possible that we 
believe will not blow up the memory". 50 MB is indeed very little. But 
uncontrolled 50 MB allocation _per worker thread_ with eg 128 threads doing 
some work is 6.25 GB. Some applications like Trino initialize their worker pool 
size to # cores or twice # cores, so 128 threads is not an absurd number. Of 
course, a machine with 64 cores will have a lot more memory than this mere ~6 
GB. Yet, ~6 GB of uncontrolled (and potentially unnecessary) allocation may 
lead to problems. Of course, same can be said about 30k size limit (3.75 GB 
allocation with 128 threads). I am not saying this particular number is the 
best one. But I would rather spend more energy in making the number _smaller_.
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to