kroepke commented on issue #13508:
URL: https://github.com/apache/iceberg/issues/13508#issuecomment-3117834513

   Hi all!
   
   I've had another look at this with @fpetersen-gl today. First, let me give 
you a brief description of how this played out in our code, and then suggest a 
few possible ways to fix/mitigate it.
   
   In our code, we regularly flush a map of tables using fan-out writers from a 
single thread. We do that in a simple for loop with an exception handler for 
`IOException` (point 1).
   We run `channel.writer.close();` which can fail in the manner described 
above (channel is our wrapper around fan-out writers).
   Unfortunately, a lot of code also throws `UncheckedIOException`, which our 
code calling `Closeable#close()` didn't expect since it already has a signature 
with the checked `IOException`, thus it missed handling it.
   That part is a bug in our code (point 2), leading to an inconsistent state 
of its channel map, with some of them closed already, the one that threw the 
exception marked as closed (point 3), and the others still being writable.
   At the next flush interval in our code, we'd use the previous channel map 
and run `close()` on each fan-out writer again.
   Let's suppose the intermittent network IO that cause the exception is fixed 
now, all `close()` calls will successfully return, even the ones that are 
already closed in the first loop.
   
   Long story short, I believe that Iceberg's API doesn't expect anyone to run 
`close()` more than once, and due to the ordering of setting the closed flag 
like @fpetersen-gl described, it will silently return even if it has 
encountered an error from its underlying writer.
   
   I think our bug, point 2, would've been much easier to spot if point 3 would 
throw an `IllegalStateException` or something like (e.g. `java.util.Stream` 
does this when you interact with a closed instance) when it is being used after 
close.
   Silently continuing masks important errors, and in our case, leads to data 
loss. I couldn't find anything in the javadocs or anywhere else that makes it 
clear that re-closing something will lead to bad outcomes, imho that should be 
fixed.
   In particular, it would be helpful if these codepaths would throw 
`IllegalStateExceptions` to make it less likely to misuse the library.
   
   I'm also unclear as to what the point of throwing unchecked exceptions from 
methods that declare the checked version of it is, it makes it very hard to 
reason about, I think that should be fixed as well.
   
   Looking forward to opinions on this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to