findepi commented on code in PR #10691:
URL: https://github.com/apache/iceberg/pull/10691#discussion_r1682523817


##########
core/src/main/java/org/apache/iceberg/util/ParallelIterable.java:
##########
@@ -88,16 +92,26 @@ private ParallelIterator(
     @Override
     public void close() {
       // close first, avoid new task submit
-      this.closed = true;
+      this.closed.set(true);
+
+      try (Closer closer = Closer.create()) {
+        yieldedTasks.forEach(closer::register);
+        yieldedTasks.clear();
 
-      // cancel background tasks
-      for (Future<?> taskFuture : taskFutures) {
-        if (taskFuture != null && !taskFuture.isDone()) {
-          taskFuture.cancel(true);
+        // TODO close input iterables that were not started yet

Review Comment:
   > The existing code use a try-with-resources for the iterable and used a 
`for` loop. 
   
   I do believe that the code in this PR is semantically equivalent, so 
whatever assumptions/contract there were, they should not be violated by these 
changes.
   
   > The way our iterables should work is that there is no need to close the 
iterable itself (i.e. no resources it holds) but it is a convenient way to 
close any iterators that were opened from the iterable, which do hold resources.
   
   i find this mental model harder to reason about.
   It's the iterable who is checked for being Closeable, so I would feel 
obliged to call close on the instance that was given to me, for me to own it.
   But it's not the place to debate this model. I will remove the TODO. 
   
   > We do need to make sure the yielded tasks or currently running tasks are 
closed. I think that closing the yielded tasks above and cancelling the running 
Task instances should take care of that. The running instances should hit the 
catch for Throwable that calls close.
   
   Yes, mostly, but there is a race condition.
   (a) a task may be about to yield or (b) already yielded but not moved to 
`yieldedTasks` collection yet.
   In both cases `taskFuture.cancel(true)` won't cause the closing of the 
iterable in the task.
   
   What the solution should be depends on whether the Task can be closed 
asyncly. If yes, it would makes things easier, but i don't think we can make 
such an assumption.
   
   I will introduce timed-wait inside the close to wait for the tasks 
continuations and close them.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to