kroepke commented on issue #13508: URL: https://github.com/apache/iceberg/issues/13508#issuecomment-3117834513
Hi all! I've had another look at this with @fpetersen-gl today. First, let me give you a brief description of how this played out in our code, and then suggest a few possible ways to fix/mitigate it. In our code, we regularly flush a map of tables using fan-out writers from a single thread. We do that in a simple for loop with an exception handler for `IOException` (point 1). We run `channel.writer.close();` which can fail in the manner described above (channel is our wrapper around fan-out writers). Unfortunately, a lot of code also throws `UncheckedIOException`, which our code calling `Closeable#close()` didn't expect since it already has a signature with the checked `IOException`, thus it missed handling it. That part is a bug in our code (point 2), leading to an inconsistent state of its channel map, with some of them closed already, the one that threw the exception marked as closed (point 3), and the others still being writable. At the next flush interval in our code, we'd use the previous channel map and run `close()` on each fan-out writer again. Let's suppose the intermittent network IO that cause the exception is fixed now, all `close()` calls will successfully return, even the ones that are already closed in the first loop. Long story short, I believe that Iceberg's API doesn't expect anyone to run `close()` more than once, and due to the ordering of setting the closed flag like @fpetersen-gl described, it will silently return even if it has encountered an error from its underlying writer. I think our bug, point 2, would've been much easier to spot if point 3 would throw an `IllegalStateException` or something like (e.g. `java.util.Stream` does this when you interact with a closed instance) when it is being used after close. Silently continuing masks important errors, and in our case, leads to data loss. I couldn't find anything in the javadocs or anywhere else that makes it clear that re-closing something will lead to bad outcomes, imho that should be fixed. In particular, it would be helpful if these codepaths would throw `IllegalStateExceptions` to make it less likely to misuse the library. I'm also unclear as to what the point of throwing unchecked exceptions from methods that declare the checked version of it is, it makes it very hard to reason about, I think that should be fixed as well. Looking forward to opinions on this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org