risey-yimu opened a new issue, #13781:
URL: https://github.com/apache/iceberg/issues/13781
### Apache Iceberg version
1.8.1
### Query engine
Spark
### Please describe the bug 🐞
use spark3.5.3 remove_orphan_files Failed!spark conf are flows:
```scala
sparkConf
.set("spark.ui.enabled", "true")
.set("spark.task.maxFailures", ls0Config.sparkTaskMaxFailures)
.set("spark.rpc.message.maxSize", ls0Config.sparkRpcMessageMaxSize)
.set("spark.sql.iceberg.handle-timestamp-without-timezone", "true")
.set("spark.hadoop.fs.s3a.access.key",ls0Config.jdbcCatalogS3AccessKey)
.set("spark.hadoop.fs.s3a.secret.key",
ls0Config.jdbcCatalogS3SecretKey)
.set("spark.hadoop.fs.s3a.endpoint", ls0Config.jdbcCatalogS3Endpoint)
.set("spark.hadoop.fs.s3a.path.style.access", "true")
.set("spark.hadoop.fs.s3a.region", "cn-east-1")
.set("spark.hadoop.fs.s3a.impl",
"org.apache.hadoop.fs.s3a.S3AFileSystem")
.set("spark.hadoop.fs.defaultFS", "s3a://warehouse")
.set("spark.hadoop.fs.s3.impl",
"org.apache.hadoop.fs.s3a.S3AFileSystem")
.set("spark.hadoop.fs.AbstractFileSystem.s3.impl","org.apache.hadoop.fs.s3a.S3A")
.set("spark.sql.extensions",
"org.projectnessie.spark.extensions.NessieSparkSessionExtensions,org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
.set(s"spark.sql.catalog.${ls0Config.restCatalogName}",
"org.apache.iceberg.spark.SparkCatalog")
.set(s"spark.sql.catalog.${ls0Config.restCatalogName}.type", "rest")
.set(s"spark.sql.catalog.${ls0Config.restCatalogName}.uri",
ls0Config.restCatalogURL)
.set(s"spark.sql.catalogImplementation", "in-memory")
.set(s"spark.sql.catalog.${ls0Config.jdbcCatalogName}",
"org.apache.iceberg.spark.SparkCatalog")
.set(s"spark.sql.catalog.${ls0Config.jdbcCatalogName}.type", "jdbc")
.set(s"spark.sql.catalog.${ls0Config.jdbcCatalogName}.uri",
ls0Config.jdbcCatalogJDBCURL)
.set(s"spark.sql.catalog.${ls0Config.jdbcCatalogName}.jdbc.user",
ls0Config.jdbcCatalogJDBCUser)
.set(s"spark.sql.catalog.${ls0Config.jdbcCatalogName}.jdbc.password",
ls0Config.jdbcCatalogJDBCPwd)
.set(s"spark.sql.catalog.${ls0Config.jdbcCatalogName}.warehouse",
ls0Config.jdbcCatalogWarehousePath)
.set(s"spark.sql.catalog.${ls0Config.jdbcCatalogName}.io-impl",
"org.apache.iceberg.aws.s3.S3FileIO")
.set(s"spark.sql.catalog.${ls0Config.jdbcCatalogName}.s3.endpoint",
ls0Config.jdbcCatalogS3Endpoint)
.set(s"spark.sql.catalog.${ls0Config.jdbcCatalogName}.s3.access-key-id",
ls0Config.jdbcCatalogS3AccessKey)
.set(s"spark.sql.catalog.${ls0Config.jdbcCatalogName}.s3.secret-access-key",
ls0Config.jdbcCatalogS3SecretKey)
.set(s"spark.sql.catalog.${ls0Config.jdbcCatalogName}.client.region",
ls0Config.jdbcCatalogClientRegion)
.set("spark.sql.defaultCatalog", ls0Config.restCatalogName)
SparkSession.builder.config(sparkConf).getOrCreate
```
I use spark-sql function remove_orphan_files to delete orphan_files,code:
```scala
spark.conf.set("spark.sql.autoBroadcastJoinThreshold", "-1")
spark.sparkContext.setLogLevel("INFO")
spark.sql("ALTER TABLE nessie.demo.ods_source_nome_t4 SET TBLPROPERTIES
('gc.enabled' = 'true')")
spark.sql(
s"""
|CALL nessie.system.remove_orphan_files(table =>
'nessie.demo.ods_source_nome_test' ,location =>
's3a://warehouse/demo/ods_source_nome_test_e0341727-b242-434a-ad20-d6b120d6fd59/data/dt=2025-07-31/',older_than
=> TIMESTAMP '2025-08-03 00:00:00.000')
|""".stripMargin)
```
but l found logs:
```
java.lang.IllegalArgumentException: Invalid S3 URI:
'http://los.uisee.com/warehouse?delete'
at
org.apache.iceberg.rest.ErrorHandlers$DefaultErrorHandler.accept(ErrorHandlers.java:206)
~[iceberg-spark-runtime-3.5_2.12-1.8.1.jar:?]
at
org.apache.iceberg.rest.ErrorHandlers$DefaultErrorHandler.accept(ErrorHandlers.java:188)
~[iceberg-spark-runtime-3.5_2.12-1.8.1.jar:?]
at org.apache.iceberg.rest.HTTPClient.throwFailure(HTTPClient.java:224)
~[iceberg-spark-runtime-3.5_2.12-1.8.1.jar:?]
at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:308)
~[iceberg-spark-runtime-3.5_2.12-1.8.1.jar:?]
at org.apache.iceberg.rest.BaseHTTPClient.post(BaseHTTPClient.java:100)
~[iceberg-spark-runtime-3.5_2.12-1.8.1.jar:?]
at
org.apache.iceberg.aws.s3.signer.S3V4RestSignerClient.sign(S3V4RestSignerClient.java:351)
~[iceberg-spark-runtime-3.5_2.12-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.stages.SigningStage.lambda$signRequest$4(SigningStage.java:154)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.util.MetricUtils.measureDuration(MetricUtils.java:63)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.stages.SigningStage.signRequest(SigningStage.java:153)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.stages.SigningStage.execute(SigningStage.java:72)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.stages.SigningStage.execute(SigningStage.java:50)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:74)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:43)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:79)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:41)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:55)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:39)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage2.executeRequest(RetryableStage2.java:93)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage2.execute(RetryableStage2.java:56)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage2.execute(RetryableStage2.java:36)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:53)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:35)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:82)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:62)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:43)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:210)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:80)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53)
~[iceberg-aws-bundle-1.8.1.jar:?]
at
software.amazon.awssdk.services.s3.DefaultS3Client.deleteObjects(DefaultS3Client.java:3626)
~[iceberg-aws-bundle-1.8.1.jar:?]
at org.apache.iceberg.aws.s3.S3FileIO.deleteBatch(S3FileIO.java:281)
~[iceberg-spark-runtime-3.5_2.12-1.8.1.jar:?]
at
org.apache.iceberg.aws.s3.S3FileIO.lambda$deleteFiles$3(S3FileIO.java:219)
~[iceberg-spark-runtime-3.5_2.12-1.8.1.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[?:?]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
17:52:36.791 [main] WARN org.apache.iceberg.aws.s3.S3FileIO - Failed to
delete object at path
s3://warehouse/demo/ods_source_nome_test_e0341727-b242-434a-ad20-d6b120d6fd59/data/dt=2025-07-31/00000-0-cd989951-6115-4da4-bb6e-b8942a69ba31-00177.parquet
17:52:36.791 [main] WARN org.apache.iceberg.aws.s3.S3FileIO - Failed to
delete object at path
s3://warehouse/demo/ods_source_nome_test_e0341727-b242-434a-ad20-d6b120d6fd59/data/dt=2025-07-31/00000-0-cd989951-6115-4da4-bb6e-b8942a69ba31-00191.parquet
17:52:36.791 [main] WARN org.apache.iceberg.aws.s3.S3FileIO - Failed to
delete object at path
s3://warehouse/demo/ods_source_nome_test_e0341727-b242-434a-ad20-d6b120d6fd59/data/dt=2025-07-31/00001-0-baf52e17-063f-4a42-a02a-18b1286b8a77-00186.parquet
17:52:36.791 [main] WARN org.apache.iceberg.aws.s3.S3FileIO - Failed to
delete object at path
s3://warehouse/demo/ods_source_nome_test_e0341727-b242-434a-ad20-d6b120d6fd59/data/dt=2025-07-31/00000-0-cd989951-6115-4da4-bb6e-b8942a69ba31-00186.parquet
17:52:36.791 [main] WARN org.apache.iceberg.aws.s3.S3FileIO - Failed to
delete object at path
s3://warehouse/demo/ods_source_nome_test_e0341727-b242-434a-ad20-d6b120d6fd59/data/dt=2025-07-31/00001-0-baf52e17-063f-4a42-a02a-18b1286b8a77-00190.parquet
17:52:36.791 [main] WARN org.apache.iceberg.aws.s3.S3FileIO - Failed to
delete object at path
s3://warehouse/demo/ods_source_nome_test_e0341727-b242-434a-ad20-d6b120d6fd59/data/dt=2025-07-31/00001-0-baf52e17-063f-4a42-a02a-18b1286b8a77-00176.parquet
17:52:36.791 [main] WARN org.apache.iceberg.aws.s3.S3FileIO - Failed to
delete object at path
s3://warehouse/demo/ods_source_nome_test_e0341727-b242-434a-ad20-d6b120d6fd59/data/dt=2025-07-31/00001-0-baf52e17-063f-4a42-a02a-18b1286b8a77-00195.parquet
17:52:36.791 [main] WARN org.apache.iceberg.aws.s3.S3FileIO - Failed to
delete object at path
s3://warehouse/demo/ods_source_nome_test_e0341727-b242-434a-ad20-d6b120d6fd59/data/dt=2025-07-31/00000-0-cd989951-6115-4da4-bb6e-b8942a69ba31-00182.parquet
17:52:36.791 [main] WARN org.apache.iceberg.aws.s3.S3FileIO - Failed to
delete object at path
s3://warehouse/demo/ods_source_nome_test_e0341727-b242-434a-ad20-d6b120d6fd59/data/dt=2025-07-31/00001-0-baf52e17-063f-4a42-a02a-18b1286b8a77-00177.parquet
17:52:36.791 [main] WARN org.apache.iceberg.aws.s3.S3FileIO - Failed to
delete object at path
s3://warehouse/demo/ods_source_nome_test_e0341727-b242-434a-ad20-d6b120d6fd59/data/dt=2025-07-31/00000-0-cd989951-6115-4da4-bb6e-b8942a69ba31-00173.parquet
17:52:36.792 [main] WARN org.apache.iceberg.aws.s3.S3FileIO - Failed to
delete object at path
s3://warehouse/demo/ods_source_nome_test_e0341727-b242-434a-ad20-d6b120d6fd59/data/dt=2025-07-31/00001-0-baf52e17-063f-4a42-a02a-18b1286b8a77-00182.parquet
17:52:36.792 [main] WARN
org.apache.iceberg.spark.actions.DeleteOrphanFilesSparkAction - Deleted only 0
of 11 files using bulk deletes
17:52:36.809 [main] INFO
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator - Code
generated in 3.935872 ms
17:52:36.826 [main] INFO
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator - Code
generated in 9.080008 ms
[s3a://warehouse/demo/ods_source_nome_test_e0341727-b242-434a-ad20-d6b120d6fd59/data/dt=2025-07-31/00000-0-cd989951-6115-4da4-bb6e-b8942a69ba31-00173.parquet]
[s3a://warehouse/demo/ods_source_nome_test_e0341727-b242-434a-ad20-d6b120d6fd59/data/dt=2025-07-31/00000-0-cd989951-6115-4da4-bb6e-b8942a69ba31-00177.parquet]
[s3a://warehouse/demo/ods_source_nome_test_e0341727-b242-434a-ad20-d6b120d6fd59/data/dt=2025-07-31/00000-0-cd989951-6115-4da4-bb6e-b8942a69ba31-00182.parquet]
```
why delete orphan_files failed,How to solve this problem
### Willingness to contribute
- [ ] I can contribute a fix for this bug independently
- [x] I would be willing to contribute a fix for this bug with guidance from
the Iceberg community
- [x] I cannot contribute a fix for this bug at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]