[I] add_files with RestCatalog, S3FileIO [iceberg]

via GitHub Fri, 15 Nov 2024 01:37:50 -0800


DongSeungLee opened a new issue, #11558:
URL: https://github.com/apache/iceberg/issues/11558


   ### Query engine
   
   Spark 3.5.3
   
   ### Question
   
   for study, i run spark cluster standalone in my local, and i have developed 
my own IcebergRestCatalog.
   My IcebergRestCatalog Iceberg spec is followed by 1.6.1 version
   for running add_files like below.
   ```
   CALL iceberg.system.add_files(
   table => 'yearly_month_clicks',
   source_table => '`parquet`.`s3a://dataquery-warehouse/iceberg/data`'
   );
   ```
   error occurs like below.
   ```
   Caused by: org.apache.iceberg.exceptions.RuntimeIOException: Failed to get 
file system for path: 
s3://dataquery-warehouse/iceberg/dataquery/yearly_month_clicks/metadata/stage-31-task-1619-manifest-855c8009-c073-48b0-9fd7-e12c1daf8930.avro
        at org.apache.iceberg.hadoop.Util.getFs(Util.java:58)
        at 
org.apache.iceberg.hadoop.HadoopOutputFile.fromPath(HadoopOutputFile.java:53)
        at 
org.apache.iceberg.hadoop.HadoopFileIO.newOutputFile(HadoopFileIO.java:97)
        at 
org.apache.iceberg.spark.SparkTableUtil.buildManifest(SparkTableUtil.java:368)
        at 
org.apache.iceberg.spark.SparkTableUtil.lambda$importSparkPartitions$1e94a719$1(SparkTableUtil.java:796)
        at 
org.apache.spark.sql.Dataset.$anonfun$mapPartitions$1(Dataset.scala:3414)
        at 
org.apache.spark.sql.execution.MapPartitionsExec.$anonfun$doExecute$3(objects.scala:198)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
        at org.apache.spark.scheduler.Task.run(Task.scala:141)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)
   Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No 
FileSystem for scheme "s3"
        at 
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3443)
        at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
        at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
        at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
        at org.apache.iceberg.hadoop.Util.getFs(Util.java:56)
   ```
   from my point of view, spark try to create staging metadata from location of 
which iceberg table metadata has.
   here, iceberg metadata location is started with `s3`, and scheme is fixed as 
s3.
   Spark try to access file system by hadoop s3aFileSystem, thus it seems 
scheme s3 is not supported.
   how can i overcome this issue?
   thanks, sincerely


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] add_files with RestCatalog, S3FileIO [iceberg]

Reply via email to