findepi commented on code in PR #10691:
URL: https://github.com/apache/iceberg/pull/10691#discussion_r1688735301


##########
core/src/main/java/org/apache/iceberg/util/ParallelIterable.java:
##########
@@ -20,84 +20,117 @@
 
 import java.io.Closeable;
 import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.util.ArrayDeque;
+import java.util.Deque;
 import java.util.Iterator;
 import java.util.NoSuchElementException;
+import java.util.Optional;
+import java.util.concurrent.CompletableFuture;
 import java.util.concurrent.ConcurrentLinkedQueue;
 import java.util.concurrent.ExecutionException;
 import java.util.concurrent.ExecutorService;
-import java.util.concurrent.Future;
-import org.apache.iceberg.exceptions.RuntimeIOException;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.function.Supplier;
 import org.apache.iceberg.io.CloseableGroup;
 import org.apache.iceberg.io.CloseableIterable;
 import org.apache.iceberg.io.CloseableIterator;
 import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
 import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.io.Closer;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 public class ParallelIterable<T> extends CloseableGroup implements 
CloseableIterable<T> {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(ParallelIterable.class);
+
+  // Logic behind default value: ParallelIterable is often used for file 
planning.
+  // Assuming that a DataFile or DeleteFile is about 500 bytes, a 30k limit 
uses 14.3 MB of memory.
+  private static final int DEFAULT_MAX_QUEUE_SIZE = 30_000;

Review Comment:
   > I disagree with "the lowest number possible that doesn't affect 
performance negatively" because we don't know what the affect on performance 
is. Until we do, we should be more conservative.
   
   I think we actually agree, despite wording syntactically suggesting 
otherwise.
   
   IF we knew the performance impact THEN we would go with "the lowest number 
possible that doesn't affect performance negatively".
   But we _don't know_ and we try to be conservative and 30k may be a good 
conservative value. (Maybe 10k was also a good conservative value. Maybe none 
of them is -- we don't know for sure,)
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to