SandeepSinghGahir commented on issue #10340:
URL: https://github.com/apache/iceberg/issues/10340#issuecomment-2550591724

   Hi @amogh-jahagirdar,
   
   This issue isn't resolved yet. Upon Glue 5.0 release, I tested a job with 
Iceberg 1.7.0 and I'm still seeing the same error with just different logging.  
Here is stack trace:
   
   Any help in resolving this issue is greatly appreciated.
   
   ```
   ERROR        2024-12-07T02:04:22,219 814843  
com.amazonaws.services.glueexceptionanalysis.GlueExceptionAnalysisListener      
[spark-listener-group-shared]   [Glue Exception Analysis] 
{"Event":"GlueExceptionAnalysisStageFailed","Timestamp":1733537062218,"Failure 
Reason":"org.apache.spark.shuffle.FetchFailedException: Error in reading 
FileSegmentManagedBuffer[file=/tmp/blockmgr-e22f16fc-d99e-4692-aa4b-66a91/0c/shuffle_11_118332_0.data,offset=288812863,length=188651]","Stack
 Trace":[{"Declaring Class":"org.apache.spark.errors.SparkCoreErrors$","Method 
Name":"fetchFailedError","File Name":"SparkCoreErrors.scala","Line 
Number":437},{"Declaring 
Class":"org.apache.spark.storage.ShuffleBlockFetcherIterator","Method 
Name":"throwFetchFailedException","File 
Name":"ShuffleBlockFetcherIterator.scala","Line Number":1304},{"Declaring 
Class":"org.apache.spark.storage.ShuffleBlockFetcherIterator","Method 
Name":"next","File Name":"ShuffleBlockFetcherIterator.scala","Line 
Number":957},{"Declaring Class":"org.apach
 e.spark.storage.Shuffl
   ERROR        2024-12-07T02:04:25,531 818155  
org.apache.spark.scheduler.TaskSchedulerImpl    
[dispatcher-CoarseGrainedScheduler]     Lost executor 95 on 172.34.30.9: Remote 
RPC client disassociated. Likely due to containers exceeding thresholds, or 
network issues. Check driver logs for WARN messages.
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)
   Caused by: org.apache.spark.ExecutorDeadException: [INTERNAL_ERROR_NETWORK] 
The relative remote executor(Id: 95), which maintains the block data to fetch 
is dead.
        at 
org.apache.spark.network.netty.NettyBlockTransferService$$anon$2.createAndStart(NettyBlockTransferService.scala:145)
        at 
org.apache.spark.network.shuffle.RetryingBlockTransferor.transferAllOutstanding(RetryingBlockTransferor.java:173)
        at 
org.apache.spark.network.shuffle.RetryingBlockTransferor.lambda$initiateRetry$0(RetryingBlockTransferor.java:206)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.bas
        )at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)
   Caused by: org.apache.spark.ExecutorDeadException: [INTERNAL_ERROR_NETWORK] 
The relative remote executor(Id: 95), which maintains the block data to fetch 
is dead.
           at 
org.apache.spark.network.netty.NettyBlockTransferService$$anon$2.createAndStart(NettyBlockTransferService.scala:145)
        at 
org.apache.spark.network.shuffle.RetryingBlockTransferor.transferAllOutstanding(RetryingBlockTransferor.java:173)
        at 
org.apache.spark.network.shuffle.RetryingBlockTransferor.lambda$initiateRetry$0(RetryingBlockTransferor.java:206)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.ba
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)
   Caused by: org.apache.spark.ExecutorDeadException: [INTERNAL_ERROR_NETWORK] 
The relative remote executor(Id: 95), which maintains the block data to fetch 
is dead.
        at 
org.apache.spark.network.netty.NettyBlockTransferService$$anon$2.createAndStart(NettyBlockTransferService.scala:145)
        at 
org.apache.spark.network.shuffle.RetryingBlockTransferor.transferAllOutstanding(RetryingBlockTransferor.java:173)
        at 
org.apache.spark.network.shuffle.RetryingBlockTransferor.start(RetryingBlockTransferor.java:152)
        at 
org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:155)
        at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.sendRequest(ShuffleBlockFetcherIterator.scala:403)
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.send$1
   
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)
   Caused by: org.apache.spark.ExecutorDeadException: [INTERNAL_ERROR_NETWORK] 
The relative remote executor(Id: 95), which maintains the block data to fetch 
is dead.
        at 
org.apache.spark.network.netty.NettyBlockTransferService$$anon$2.createAndStart(NettyBlockTransferService.scala:145)
        at 
org.apache.spark.network.shuffle.RetryingBlockTransferor.transferAllOutstanding(RetryingBlockTransferor.java:173)
        at 
org.apache.spark.network.shuffle.RetryingBlockTransferor.lambda$initiateRetry$0(RetryingBlockTransferor.java:206)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.bas
   
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)
   Caused by: org.apache.spark.ExecutorDeadException: [INTERNAL_ERROR_NETWORK] 
The relative remote executor(Id: 95), which maintains the block data to fetch 
is dead.
        at 
org.apache.spark.network.netty.NettyBlockTransferService$$anon$2.createAndStart(NettyBlockTransferService.scala:145)
        at 
org.apache.spark.network.shuffle.RetryingBlockTransferor.transferAllOutstanding(RetryingBlockTransferor.java:173)
        at 
org.apache.spark.network.shuffle.RetryingBlockTransferor.lambda$initiateRetry$0(RetryingBlockTransferor.java:206)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.bas
   )
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)
   Caused by: org.apache.spark.ExecutorDeadException: [INTERNAL_ERROR_NETWORK] 
The relative remote executor(Id: 95), which maintains the block data to fetch 
is dead.
        at 
org.apache.spark.network.netty.NettyBlockTransferService$$anon$2.createAndStart(NettyBlockTransferService.scala:145)
        at 
org.apache.spark.network.shuffle.RetryingBlockTransferor.transferAllOutstanding(RetryingBlockTransferor.java:173)
        at 
org.apache.spark.network.shuffle.RetryingBlockTransferor.lambda$initiateRetry$0(RetryingBlockTransferor.java:206)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.ba
   ERROR        2024-12-07T02:04:28,291 820915  
com.amazonaws.services.glueexceptionanalysis.GlueExceptionAnalysisListener      
[spark-listener-group-shared]   [Glue Exception Analysis] 
{"Event":"GlueExceptionAnalysisTaskFailed","Timestamp":1733537068290,"Failure 
Reason":"Connection pool shut down","Stack Trace":[{"Declaring 
Class":"software.amazon.awssdk.thirdparty.org.apache.http.util.Asserts","Method 
Name":"check","File Name":"Asserts.java","Line Number":34},{"Declaring 
Class":"software.amazon.awssdk.thirdparty.org.apache.http.impl.conn.PoolingHttpClientConnectionManager","Method
 Name":"requestConnection","File 
Name":"PoolingHttpClientConnectionManager.java","Line Number":269},{"Declaring 
Class":"software.amazon.awssdk.http.apache.internal.conn.ClientConnectionManagerFactory$DelegatingHttpClientConnectionManager","Method
 Name":"requestConnection","File 
Name":"ClientConnectionManagerFactory.java","Line Number":75},{"Declaring 
Class":"software.amazon.awssdk.http.apache.internal.conn.ClientConnecti
 onManagerFactory$Instrumented
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)
   Caused by: org.apache.spark.ExecutorDeadException: [INTERNAL_ERROR_NETWORK] 
The relative remote executor(Id: 95), which maintains the block data to fetch 
is dead.
        at 
org.apache.spark.network.netty.NettyBlockTransferService$$anon$2.createAndStart(NettyBlockTransferService.scala:145)
        at 
org.apache.spark.network.shuffle.RetryingBlockTransferor.transferAllOutstanding(RetryingBlockTransferor.java:173)
        at 
org.apache.spark.network.shuffle.RetryingBlockTransferor.lambda$initiateRetry$0(RetryingBlockTransferor.java:206)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base
   ERROR        2024-12-07T02:04:28,401 821025  
org.apache.spark.scheduler.TaskSetManager       [task-result-getter-3]  Task 47 
in stage 51.3 failed 4 times; aborting job
   ERROR        2024-12-07T02:04:28,408 821032  
com.amazonaws.services.glueexceptionanalysis.GlueExceptionAnalysisListener      
[spark-listener-group-shared]   [Glue Exception Analysis] 
{"Event":"GlueExceptionAnalysisJobFailed","Timestamp":1733537068406,"Failure 
Reason":"JobFailed(org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 47 in stage 51.3 failed 4 times, most recent failure: Lost task 
47.3 in stage 51.3 (TID 172048) (172.36.175.193 executor 46): 
java.lang.IllegalStateException: Connection pool shut down","Stack 
Trace":[{"Declaring Class":"org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 47 in stage 51.3 failed 4 times, most recent failure: Lost 
task 47.3 in stage 51.3 (TID 172048) (172.36.175.193 executor 46): 
java.lang.IllegalStateException: Connection pool shut down","Method 
Name":"TopLevelFailedReason","File Name":"TopLevelFailedReason","Line 
Number":-1},{"Declaring 
Class":"software.amazon.awssdk.thirdparty.org.apache.http.util.Asserts","Metho
 d Name":"check","File Name":"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to