findepi commented on code in PR #10691: URL: https://github.com/apache/iceberg/pull/10691#discussion_r1682523817
########## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ########## @@ -88,16 +92,26 @@ private ParallelIterator( @Override public void close() { // close first, avoid new task submit - this.closed = true; + this.closed.set(true); + + try (Closer closer = Closer.create()) { + yieldedTasks.forEach(closer::register); + yieldedTasks.clear(); - // cancel background tasks - for (Future<?> taskFuture : taskFutures) { - if (taskFuture != null && !taskFuture.isDone()) { - taskFuture.cancel(true); + // TODO close input iterables that were not started yet Review Comment: > The existing code use a try-with-resources for the iterable and used a `for` loop. I do believe that the code in this PR is semantically equivalent, so whatever assumptions/contract there were, they should not be violated by these changes. > The way our iterables should work is that there is no need to close the iterable itself (i.e. no resources it holds) but it is a convenient way to close any iterators that were opened from the iterable, which do hold resources. i find this mental model harder to reason about. It's the iterable who is checked for being Closeable, so I would feel obliged to call close on the instance that was given to me, for me to own it. But it's not the place to debate this model. I will remove the TODO. > We do need to make sure the yielded tasks or currently running tasks are closed. I think that closing the yielded tasks above and cancelling the running Task instances should take care of that. The running instances should hit the catch for Throwable that calls close. Yes, mostly, but there is a race condition. (a) a task may be about to yield or (b) already yielded but not moved to `yieldedTasks` collection yet. In both cases `taskFuture.cancel(true)` won't cause the closing of the iterable in the task. What the solution should be depends on whether the Task can be closed asyncly. If yes, it would makes things easier, but i don't think we can make such an assumption. I will introduce timed-wait inside the close to wait for the tasks continuations and close them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org