[I] S3 client for storage path not available [iceberg]

via GitHub Wed, 28 May 2025 09:46:49 -0700


and124578963 opened a new issue, #13174:
URL: https://github.com/apache/iceberg/issues/13174


   ### Apache Iceberg version
   
   main (development)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   Hey! I had a problem with the commit 
https://github.com/apache/iceberg/pull/12799#issuecomment-2913084781
   Spark 3.5_2.12 with jupyter-notebook.
    
   When I add these props for enabling S3FileIO, problem appears:
   ``` 
     spark.sql.defaultCatalog: iceberg
     spark.sql.catalog.iceberg.io-impl: org.apache.iceberg.aws.s3.S3FileIO
     spark.sql.catalog.iceberg.s3.endpoint: http://...
     spark.sql.catalog.iceberg.client.credentials-provider: 
software.amazon.awssdk.auth.credentials.EnvironmentVariableCredentialsProvider  
     spark.executorEnv.AWS_ENDPOINT: ${env:AWS_ENDPOINT}
     spark.executorEnv.AWS_ACCESS_KEY_ID: ${env:AWS_ACCESS_KEY_ID}
     spark.executorEnv.AWS_SECRET_ACCESS_KEY: ${env:AWS_SECRET_ACCESS_KEY}
     spark.executorEnv.AWS_REGION: ${env:AWS_REGION}
     spark.executorEnv.AWS_DEFAULT_REGION: ${env:AWS_DEFAULT_REGION}
     spark.serializer: org.apache.spark.serializer.KryoSerializer
   ```
   
   StackTrace:
   ```
   25/05/28 19:34:40 ERROR org.apache.spark.executor.Executor: Exception in 
task 7.0 in stage 1.0 (TID 8)
   java.lang.IllegalStateException: [BUG] S3 client for storage path not 
available: 
s3a://spark-nt/load_test_delete_warehouse/gen_transaction_icb_month/data/TRANSACTION_DATE_month=2010-06/00000-0-ac46d814-2605-4a4c-9dae-74d4a22f072a-00135.parquet
        at 
org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkState(Preconditions.java:603)
 ~[iceberg-spark-runtime-3.5_2.12-apache-main-raw.jar:?]
        at 
org.apache.iceberg.aws.s3.S3FileIO.clientForStoragePath(S3FileIO.java:427) 
~[iceberg-spark-runtime-3.5_2.12-apache-main-raw.jar:?]
        at org.apache.iceberg.aws.s3.S3FileIO.newInputFile(S3FileIO.java:184) 
~[iceberg-spark-runtime-3.5_2.12-apache-main-raw.jar:?]
        at 
org.apache.iceberg.encryption.EncryptingFileIO.wrap(EncryptingFileIO.java:150) 
~[iceberg-spark-runtime-3.5_2.12-apache-main-raw.jar:?]
        at 
org.apache.iceberg.relocated.com.google.common.collect.Iterators$6.transform(Iterators.java:828)
 ~[iceberg-spark-runtime-3.5_2.12-apache-main-raw.jar:?]
        at 
org.apache.iceberg.relocated.com.google.common.collect.TransformedIterator.next(TransformedIterator.java:51)
 ~[iceberg-spark-runtime-3.5_2.12-apache-main-raw.jar:?]
        at 
org.apache.iceberg.relocated.com.google.common.collect.TransformedIterator.next(TransformedIterator.java:51)
 ~[iceberg-spark-runtime-3.5_2.12-apache-main-raw.jar:?]
        at 
org.apache.iceberg.encryption.EncryptingFileIO.bulkDecrypt(EncryptingFileIO.java:63)
 ~[iceberg-spark-runtime-3.5_2.12-apache-main-raw.jar:?]
        at 
org.apache.iceberg.spark.source.BaseReader.inputFiles(BaseReader.java:177) 
~[iceberg-spark-runtime-3.5_2.12-apache-main-raw.jar:?]
        at 
org.apache.iceberg.spark.source.BaseReader.getInputFile(BaseReader.java:170) 
~[iceberg-spark-runtime-3.5_2.12-apache-main-raw.jar:?]
        at 
org.apache.iceberg.spark.source.BatchDataReader.open(BatchDataReader.java:100) 
~[iceberg-spark-runtime-3.5_2.12-apache-main-raw.jar:?]
        at 
org.apache.iceberg.spark.source.BatchDataReader.open(BatchDataReader.java:43) 
~[iceberg-spark-runtime-3.5_2.12-apache-main-raw.jar:?]
        at org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:134) 
~[iceberg-spark-runtime-3.5_2.12-apache-main-raw.jar:?]
        at 
org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:120)
 ~[spark-sql_2.12-3.5.4.jar:3.5.4]
        at 
org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:158)
 ~[spark-sql_2.12-3.5.4.jar:3.5.4]
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1(DataSourceRDD.scala:63)
 ~[spark-sql_2.12-3.5.4.jar:3.5.4]
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(DataSourceRDD.scala:63)
 ~[spark-sql_2.12-3.5.4.jar:3.5.4]
        at scala.Option.exists(Option.scala:376) ~[scala-library-2.12.18.jar:?]
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
 ~[spark-sql_2.12-3.5.4.jar:3.5.4]
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:97)
 ~[spark-sql_2.12-3.5.4.jar:3.5.4]
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
 ~[spark-sql_2.12-3.5.4.jar:3.5.4]
        at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) 
~[spark-core_2.12-3.5.4.jar:3.5.4]
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) 
~[scala-library-2.12.18.jar:?]
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
 Source) ~[?:?]
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source) ~[?:?]
        at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
 ~[spark-sql_2.12-3.5.4.jar:3.5.4]
        at 
org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
 ~[spark-sql_2.12-3.5.4.jar:3.5.4]
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) 
~[scala-library-2.12.18.jar:?]
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) 
~[scala-library-2.12.18.jar:?]
        at 
org.apache.spark.util.random.SamplingUtils$.reservoirSampleAndCount(SamplingUtils.scala:41)
 ~[spark-core_2.12-3.5.4.jar:3.5.4]
        at 
org.apache.spark.RangePartitioner$.$anonfun$sketch$1(Partitioner.scala:322) 
~[spark-core_2.12-3.5.4.jar:3.5.4]
        at 
org.apache.spark.RangePartitioner$.$anonfun$sketch$1$adapted(Partitioner.scala:320)
 ~[spark-core_2.12-3.5.4.jar:3.5.4]
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:910) 
~[spark-core_2.12-3.5.4.jar:3.5.4]
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:910)
 ~[spark-core_2.12-3.5.4.jar:3.5.4]
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) 
~[spark-core_2.12-3.5.4.jar:3.5.4]
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367) 
~[spark-core_2.12-3.5.4.jar:3.5.4]
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:331) 
~[spark-core_2.12-3.5.4.jar:3.5.4]
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) 
~[spark-core_2.12-3.5.4.jar:3.5.4]
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166) 
~[spark-core_2.12-3.5.4.jar:3.5.4]
        at org.apache.spark.scheduler.Task.run(Task.scala:141) 
~[spark-core_2.12-3.5.4.jar:3.5.4]
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
 ~[spark-core_2.12-3.5.4.jar:3.5.4]
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
 ~[spark-common-utils_2.12-3.5.4.jar:3.5.4]
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
 ~[spark-common-utils_2.12-3.5.4.jar:3.5.4]
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) 
~[spark-core_2.12-3.5.4.jar:3.5.4]
        at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) 
[spark-core_2.12-3.5.4.jar:3.5.4]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) 
[?:?]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) 
[?:?]
        at java.lang.Thread.run(Thread.java:840) [?:?]
   ```
   
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [x] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] S3 client for storage path not available [iceberg]

Reply via email to