cgpoh opened a new issue, #6606:
URL: https://github.com/apache/iceberg/issues/6606
### Apache Iceberg version
1.1.0 (latest release)
### Query engine
Flink
### Please describe the bug 🐞
operating environment:
Flink 1.15.2
Iceberg 1.1.0
Hadoop AWS 2.10.1
MinIO S3 storage
When running a Flink job streaming from an Iceberg table, after few hours,
Flink will throw the following exception and unable to restart the job:
`2023-01-17 05:34:55,864 WARN org.apache.iceberg.util.Tasks
[] - Retrying task after failure: Failed to open input stream
for file:
s3a://recordings/raw_2019/fpl/metadata/04726-837067cc-0731-44a9-a99c-68b4c7c9e8f8.metadata.json
org.apache.iceberg.exceptions.RuntimeIOException: Failed to open input
stream for file:
s3a://recordings/raw_2019/fpl/metadata/04726-837067cc-0731-44a9-a99c-68b4c7c9e8f8.metadata.json
at
org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:187)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:273)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:267)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$0(BaseMetastoreTableOperations.java:183)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$1(BaseMetastoreTableOperations.java:202)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:402)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:212)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:189)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:202)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:179)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:174)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:243)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:44)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.iceberg.flink.TableLoader$CatalogTableLoader.loadTable(TableLoader.java:109)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.iceberg.flink.source.IcebergSource.lazyTable(IcebergSource.java:125)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.iceberg.flink.source.IcebergSource.createReader(IcebergSource.java:142)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.flink.streaming.api.operators.SourceOperator.initReader(SourceOperator.java:286)
~[flink-dist-1.15.2.jar:1.15.2]
at
org.apache.flink.streaming.runtime.tasks.SourceOperatorStreamTask.init(SourceOperatorStreamTask.java:94)
~[flink-dist-1.15.2.jar:1.15.2]
at
org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:666)
~[flink-dist-1.15.2.jar:1.15.2]
at
org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:643)
~[flink-dist-1.15.2.jar:1.15.2]
at
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
[flink-dist-1.15.2.jar:1.15.2]
at
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:917)
[flink-dist-1.15.2.jar:1.15.2]
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
[flink-dist-1.15.2.jar:1.15.2]
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
[flink-dist-1.15.2.jar:1.15.2]
at java.lang.Thread.run(Unknown Source) [?:?]
Caused by: java.io.InterruptedIOException: getFileStatus on
s3a://recordings/raw_2019/fpl/metadata/04726-837067cc-0731-44a9-a99c-68b4c7c9e8f8.metadata.json:
com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout
waiting for connection from pool
at
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:141)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:117)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:1926)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1876)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1812)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:611)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:787)
~[flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:2.8.3-10.0]
at
org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:183)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
... 27 more
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1264)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:1086)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:1912)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1876)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1812)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:611)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:787)
~[flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:2.8.3-10.0]
at
org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:183)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
... 27 more
Caused by:
com.amazonaws.thirdparty.apache.http.conn.ConnectionPoolTimeoutException:
Timeout waiting for connection from pool
at
com.amazonaws.thirdparty.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:286)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
com.amazonaws.thirdparty.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:263)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at jdk.internal.reflect.GeneratedMethodAccessor21.invoke(Unknown
Source) ~[?:?]
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
Source) ~[?:?]
at java.lang.reflect.Method.invoke(Unknown Source) ~[?:?]
at
com.amazonaws.http.conn.ClientConnectionRequestFactory$Handler.invoke(ClientConnectionRequestFactory.java:70)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at com.amazonaws.http.conn.$Proxy61.get(Unknown Source) ~[?:?]
at
com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:190)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1236)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1264)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:1086)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:1912)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1876)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1812)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:611)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:787)
~[flink-shaded-hadoop-2-uber-2.8.3-10.0.jar:2.8.3-10.0]
at
org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:183)
~[blob_p-ef9f7a65353e12b1d19241c408ca3df4fbd64570-24f467b241db1d55f5345b551c2cf4ed:?]
... 27 more`
Setting the s3.connection.maximum in flink conf to 100 does not help
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]