javrasya commented on issue #9444: URL: https://github.com/apache/iceberg/issues/9444#issuecomment-2005133969
The code I share earlier @shanzi caused some data loss. It was not closing the currently open stream, so be careful with that. I am sorry see that my buggy code above is spreading. I have a safe version to use and I got the inspiration (copied/pasted) from the PR @amogh-jahagirdar mentioned. Here is the custom S3InputFile I have been using with my Flink and Spark projects; I have tested it with data at scale and my data loss problem went away 100% and I have not been getting that socket closed exception anymore. https://gist.github.com/javrasya/76ad0267399e379f5801a6d75c09882a -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org