a8356555 opened a new issue, #9521: URL: https://github.com/apache/iceberg/issues/9521
### Query engine Spark ### Question I am currently using Flink to stream data into an Iceberg table. The Flink job writes to the Iceberg table every minute. Due to the presence of too many small files, I use following Spark code for maintenance. ```sql CALL glue_catalog.system.rewrite_data_files( table => 'my_db.my_table', options => map( 'max-concurrent-file-group-rewrites', '4', 'partial-progress.enabled', 'true') ) ``` However, during the maintenance process, I encountered the following ERROR. ``` 24/01/17 18:12:29 ERROR RewriteDataFilesCommitManager: Cannot commit groups [RewriteFileGroup{info=FileGroupInfo{globalIndex=350, partitionIndex=1, partition=PartitionData{date_utc8=19483}}, numRewrittenFiles=163, numAddedFiles=2, numRewrittenBytes=140070183}, ...], attempting to clean up written files org.apache.iceberg.exceptions.CommitFailedException: Cannot commit GlueCatalog.bitopro_ods.ods_bitopro_mysql_order_matches because base metadata location 's3://production-data-glue-iceberg-warehouse/bitopro_ods.db/ods_bitopro_mysql_order_matches/metadata/66157-79e49b17-b6c6-432b-9e06-c38b2150c312.metadata.json' is not same as the current Glue location 's3://production-data-glue-iceberg-warehouse/bitopro_ods.db/ods_bitopro_mysql_order_matches/metadata/66160-9b65de57-7904-4e08-b215-f88f00b8c66d.metadata.json' at org.apache.iceberg.aws.glue.GlueTableOperations.checkMetadataLocation(GlueTableOperations.java:272) at org.apache.iceberg.aws.glue.GlueTableOperations.doCommit(GlueTableOperations.java:158) at org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:135) at org.apache.iceberg.SnapshotProducer.lambda$commit$2(SnapshotProducer.java:390) at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413) at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219) at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203) at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196) at org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:364) at org.apache.iceberg.actions.RewriteDataFilesCommitManager.commitFileGroups(RewriteDataFilesCommitManager.java:78) at org.apache.iceberg.actions.RewriteDataFilesCommitManager.commitOrClean(RewriteDataFilesCommitManager.java:100) at org.apache.iceberg.actions.RewriteDataFilesCommitManager$CommitService.commitOrClean(RewriteDataFilesCommitManager.java:134) at org.apache.iceberg.actions.BaseCommitService.commitReadyCommitGroups(BaseCommitService.java:205) at org.apache.iceberg.actions.BaseCommitService.offer(BaseCommitService.java:133) at org.apache.iceberg.spark.actions.RewriteDataFilesSparkAction.lambda$doExecuteWithPartialProgress$4(RewriteDataFilesSparkAction.java:355) at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413) at org.apache.iceberg.util.Tasks$Builder.access$300(Tasks.java:69) at org.apache.iceberg.util.Tasks$Builder$1.run(Tasks.java:315) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.base/java.util.concurrent.FutureTask.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) ``` It looks like a concurrent write issue. However, the flink job cannot be stopped. How can I solve this error? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org