huyuanfeng2018 opened a new pull request, #7696:
URL: https://github.com/apache/iceberg/pull/7696
When I used Flink to write to the iceberg table, I made a layer of
encapsulation for the outputStream, which will eventually be converted into a
HadoopPositionOutputStream object. It implements DelegatingOutputStream, and
the HadoopPositionOutputStream object rewrites the finalize method. This method
will try to close the stream, but there will be problems in parquetIO:
```
static PositionOutputStream
stream(org.apache.iceberg.io.PositionOutputStream stream) {
if (stream instanceof DelegatingOutputStream) {
OutputStream wrapped = ((DelegatingOutputStream) stream).getDelegate();
if (wrapped instanceof FSDataOutputStream) {
return HadoopStreams.wrap((FSDataOutputStream) wrapped);
}
}
return new ParquetOutputStreamAdapter(stream);
}
```
Here, the Delegate will be taken out and repackaged, and the
HadoopPositionOutputStream object will be discarded. At this time, the
HadoopPositionOutputStream object may not be continuously referenced, and may
be recycled during gc. If gc is triggered to recycle the
HadoopPositionOutputStream object during the stream writing process, close will
be called at this time. The method closes the stream, causing file corruption.
This pr is used to solve this problem. Add takeDelegate to
DelegatingOutputStream to take the Delegate out and set it to null to prevent
finalize close
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]