stevenzwu commented on code in PR #10691: URL: https://github.com/apache/iceberg/pull/10691#discussion_r1676050363
########## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ########## @@ -88,7 +91,18 @@ private ParallelIterator( @Override public void close() { // close first, avoid new task submit - this.closed = true; + this.closed.set(true); + + for (Task<T> task : yieldedTasks) { + try { + task.close(); + } catch (Exception e) { + throw new RuntimeException("Close failed", e); Review Comment: we may want to finish the close for loop in case of failure in the middle. should we use the `Tasks` util here? Here is an example from `CatalogUtil::deleteFile` ``` Tasks.foreach(files) .executeWith(ThreadPools.getWorkerPool()) .noRetry() .suppressFailureWhenFinished() .onFailure((file, exc) -> LOG.warn("Failed to delete {} file {}", type, file, exc)) .run(io::deleteFile); ``` ########## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ########## @@ -192,4 +209,65 @@ public synchronized T next() { return queue.poll(); } } + + private static class Task<T> implements Callable<Optional<Task<T>>>, AutoCloseable { + private final Iterable<T> input; + private final ConcurrentLinkedQueue<T> queue; + private final AtomicBoolean closed; + private final int approximateMaxQueueSize; + + private Iterator<T> iterator; + + Task( + Iterable<T> input, + ConcurrentLinkedQueue<T> queue, + AtomicBoolean closed, + int approximateMaxQueueSize) { + this.input = Preconditions.checkNotNull(input, "input cannot be null"); + this.queue = Preconditions.checkNotNull(queue, "queue cannot be null"); + this.closed = Preconditions.checkNotNull(closed, "closed cannot be null"); + this.approximateMaxQueueSize = approximateMaxQueueSize; + } + + @Override + public Optional<Task<T>> call() throws Exception { + try { + if (iterator == null) { + iterator = input.iterator(); + } + while (iterator.hasNext()) { + if (queue.size() >= approximateMaxQueueSize) { + // yield + return Optional.of(this); + } + T next = iterator.next(); Review Comment: shouldn't item retrieval happen after the `closed` check? otherwise, we may lose the item if break happened -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org