huaxiangsun opened a new issue, #8218:
URL: https://github.com/apache/iceberg/issues/8218

   ### Apache Iceberg version
   
   1.0.0
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   We run into a weird Aws S3 exceptions "Unable to execute HTTP request: The 
target server failed to respond"  during table merge, which causes lots of task 
failures. 
   
   Here is our case in detail. We have one table with a certain # of 
partitions. Initially, each partition has 1 DataFile with ~2GB.
   It is a v2 table format and we use Merge-on-read to do daily table 
update(table merge), using Sparks' HashShuffleJoin with Storage Partition Join 
enabled. All data is at aws s3.
   
   For daily update, there are some updates and some new inserts, no deletes.
   It starts with a DataFile0
   After day 1's table update (using table merge), it creates 1 DeleteFIle-d1, 
1 DataFile-d1-update (for update) and 1 DataFile-d1-insert (for new inserts).
   After day 2's table update (using table merge), it creates 1 DeleteFIle-d2,  
1 DataFile-d2-update (for update) and 1 DataFile-d2-insert (for new inserts).
   After day 3's table update (using table merge), it creates 1 DeleteFIle-d3,  
1 DataFile-d3-update (for update) and 1 DataFile-d3-insert (for new inserts).
   
   All went ok though there are a very small number of the aws s3 exceptions 
mentioned above.
   
   For day 4's table update (using table merge), For each executor, after the 
first set of tasks finished and it picks up a new set of tasks, for these new 
tasks, we observed a huge number of the following aws s3 exceptions which 
caused task to fail. We suspected it is related with number of  DataFiles and 
DeleteFiles as there are 7 DataFiles and 1 DeleteFile for one partition. But we 
could not figure out the root cause, not sure if this is caused by s3 backend 
or aws sdk client side. 
   
   Attached is the exception stack. 
   
   ```
   23/07/27 17:30:05 ERROR BaseReader: Error reading file(s): s3a://****
   software.amazon.awssdk.core.exception.SdkClientException: Unable to execute 
HTTP request: The target server failed to respond
           at 
software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:98)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:43)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:205)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:66)
 ~[sdk-core-2.15.40.jar:?]
   
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:34)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:48)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:31)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
 ~[sdk-core-2.15.40.jar:?]
   
           at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:193)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:133)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:159)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:112)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:167)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:94)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:55)
 ~[aws-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.services.s3.DefaultS3Client.headObject(DefaultS3Client.java:5011)
 ~[s3-2.15.40.jar:?]
   
           at 
org.apache.iceberg.aws.s3.BaseS3File.getObjectMetadata(BaseS3File.java:85) 
~[iceberg-spark-runtime-3.3_2.12-1.0.15.jar]
           at 
org.apache.iceberg.aws.s3.S3InputFile.getLength(S3InputFile.java:75) 
~[iceberg-spark-runtime-3.3_2.12-1.0.15.jar]
           at 
org.apache.iceberg.parquet.ParquetIO$ParquetInputFile.getLength(ParquetIO.java:179)
 ~[iceberg-spark-runtime-3.3_2.12-1.0.15.jar]
           at 
org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:534)
 ~[iceberg-spark-runtime-3.3_2.12-1.0.15.jar]
           at 
org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:777)
 ~[iceberg-spark-runtime-3.3_2.12-1.0.15.jar]
           at 
org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:658)
 ~[iceberg-spark-runtime-3.3_2.12-1.0.15.jar]
           at org.apache.iceberg.parquet.ReadConf.newReader(ReadConf.java:245) 
~[iceberg-spark-runtime-3.3_2.12-1.0.15.jar]
           at org.apache.iceberg.parquet.ReadConf.<init>(ReadConf.java:81) 
~[iceberg-spark-runtime-3.3_2.12-1.0.15.jar]
           at 
org.apache.iceberg.parquet.ParquetReader.init(ParquetReader.java:71) 
~[iceberg-spark-runtime-3.3_2.12-1.0.15.jar]
           at 
org.apache.iceberg.parquet.ParquetReader.iterator(ParquetReader.java:91) 
~[iceberg-spark-runtime-3.3_2.12-1.0.15.jar]
           at 
org.apache.iceberg.parquet.ParquetReader.iterator(ParquetReader.java:37) 
~[iceberg-spark-runtime-3.3_2.12-1.0.15.jar]
           at org.apache.iceberg.util.Filter.lambda$filter$0(Filter.java:34) 
~[iceberg-spark-runtime-3.3_2.12-1.0.15.jar]
           at 
org.apache.iceberg.io.CloseableIterable$2.iterator(CloseableIterable.java:72) 
~[iceberg-spark-runtime-3.3_2.12-1.0.15.jar]
           at 
org.apache.iceberg.io.CloseableIterable$7$1.<init>(CloseableIterable.java:188) 
~[iceberg-spark-runtime-3.3_2.12-1.0.15.jar]
           at 
org.apache.iceberg.io.CloseableIterable$7.iterator(CloseableIterable.java:187) 
~[iceberg-spark-runtime-3.3_2.12-1.0.15.jar]
           at 
org.apache.iceberg.io.CloseableIterable$ConcatCloseableIterable$ConcatCloseableIterator.hasNext(CloseableIterable.java:257)
 ~[iceberg-spark-runtime-3.3_2.12-1.0.15.jar]
   
           at java.lang.Iterable.forEach(Iterable.java:74) ~[?:?]
           at 
org.apache.iceberg.deletes.Deletes.toPositionIndex(Deletes.java:138) 
~[iceberg-spark-runtime-3.3_2.12-1.0.15.jar]
           at 
org.apache.iceberg.deletes.Deletes.toPositionIndex(Deletes.java:132) 
~[iceberg-spark-runtime-3.3_2.12-1.0.15.jar]
           at 
org.apache.iceberg.data.DeleteFilter.applyPosDeletes(DeleteFilter.java:250) 
~[iceberg-spark-runtime-3.3_2.12-1.0.15.jar]
           at 
org.apache.iceberg.data.DeleteFilter.filter(DeleteFilter.java:154) 
~[iceberg-spark-runtime-3.3_2.12-1.0.15.jar]
           at 
org.apache.iceberg.spark.source.RowDataReader.open(RowDataReader.java:92) 
~[iceberg-spark-runtime-3.3_2.12-1.0.15.jar]
           at 
org.apache.iceberg.spark.source.RowDataReader.open(RowDataReader.java:42) 
~[iceberg-spark-runtime-3.3_2.12-1.0.15.jar]
           at 
org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:135) 
~[iceberg-spark-runtime-3.3_2.12-1.0.15.jar]
           at 
org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:119)
 ~[spark-sql_2.12-3.3.0.48.jar]
        at 
org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:156)
 ~[spark-sql_2.12-3.3.0.48.jar]
           at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1(DataSourceRDD.scala:63)
 ~[spark-sql_2.12-3.3.0.48.jar]
   
           at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(DataSourceRDD.scala:63)
 ~[spark-sql_2.12-3.3.0.48.jar]
           at scala.Option.exists(Option.scala:376) 
~[scala-library-2.12.15.jar:?]
           at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
 ~[spark-sql_2.12-3.3.0.48.jar]
           at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:97)
 ~[spark-sql_2.12-3.3.0.48.jar]
           at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
 ~[spark-sql_2.12-3.3.0.48.jar]
           at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) 
~[spark-core_2.12-3.3.0.48.jar]
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) 
~[scala-library-2.12.15.jar:?]
           at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source) ~[?:?]
           at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
 ~[spark-sql_2.12-3.3.0.48.jar]
           at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
 ~[spark-sql_2.12-3.3.0.48.jar]
           at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) 
~[scala-library-2.12.15.jar:?]
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) 
~[scala-library-2.12.15.jar:?]
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) 
~[scala-library-2.12.15.jar:?]
           at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:513) 
~[scala-library-2.12.15.jar:?]
   
           at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:225)
 ~[spark-sql_2.12-3.3.0.48.jar]
           at 
org.apache.spark.sql.execution.SortExec.$anonfun$doExecute$1(SortExec.scala:119)
 ~[spark-sql_2.12-3.3.0.48.jar]
           at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) 
~[spark-core_2.12-3.3.0.48.jar]
           at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
 ~[spark-core_2.12-3.3.0.48.jar]
           at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) 
~[spark-core_2.12-3.3.0.48.jar]
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) 
~[spark-core_2.12-3.3.0.48.jar]
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) 
~[spark-core_2.12-3.3.0.48.jar]
           at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) 
~[spark-core_2.12-3.3.0.48.jar]
           at org.apache.spark.scheduler.Task.run(Task.scala:136) 
~[spark-core_2.12-3.3.0.48.jar]
   
           at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
 ~[spark-core_2.12-3.3.0.48.jar]
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1513) 
~[spark-core_2.12-3.3.0.48.jar]
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) 
~[spark-core_2.12-3.3.0.48.jar]
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
~[?:?]
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
~[?:?]
           at java.lang.Thread.run(Thread.java:829) ~[?:?]
   
   Caused by: org.apache.http.NoHttpResponseException: The target server failed 
to respond
           at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)
 ~[httpclient-4.5.13.jar:4.5.13]
           at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
 ~[httpclient-4.5.13.jar:4.5.13]
           at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
 ~[httpcore-4.4.14.jar:4.4.14]
   
           at 
org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
 ~[httpcore-4.4.14.jar:4.4.14]
           at 
org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157) 
~[httpclient-4.5.13.jar:4.5.13]
           at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
 ~[httpcore-4.4.14.jar:4.4.14]
           at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
 ~[httpcore-4.4.14.jar:4.4.14]
           at 
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) 
~[httpclient-4.5.13.jar:4.5.13]
           at 
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) 
~[httpclient-4.5.13.jar:4.5.13]
           at 
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
 ~[httpclient-4.5.13.jar:4.5.13]
   
           at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
 ~[httpclient-4.5.13.jar:4.5.13]
           at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
 ~[httpclient-4.5.13.jar:4.5.13]
           at 
software.amazon.awssdk.http.apache.internal.impl.ApacheSdkHttpClient.execute(ApacheSdkHttpClient.java:72)
 ~[apache-client-2.15.40.jar:?]
           at 
software.amazon.awssdk.http.apache.ApacheHttpClient.execute(ApacheHttpClient.java:253)
 ~[apache-client-2.15.40.jar:?]
           at 
software.amazon.awssdk.http.apache.ApacheHttpClient.access$500(ApacheHttpClient.java:106)
 ~[apache-client-2.15.40.jar:?]
           at 
software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:232)
 ~[apache-client-2.15.40.jar:?]
           at 
software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:229)
 ~[apache-client-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.util.MetricUtils.measureDurationUnsafe(MetricUtils.java:64)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.executeHttpRequest(MakeHttpRequestStage.java:76)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:55)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:39)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:73)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:77)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:39)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:50)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:36)
 ~[sdk-core-2.15.40.jar:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:64)
 ~[sdk-core-2.15.40.jar:?]
           ... 78 more
   
   
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to