CodingJun commented on issue #10312: URL: https://github.com/apache/iceberg/issues/10312#issuecomment-2472793323
> yea I think @RussellSpitzer is right, we should rely on validation error to prevent this scenario here, ie T1 should not be able to commit successfully. > > i need to understand one thing: > > > When I set use-starting-sequence-number = false for rewriteDataFiles, Thread 1 compact data files failed at t4. stacktrace: > > Caused by: org.apache.iceberg.exceptions.ValidationException: Cannot commit, found new delete for replaced data file: GenericDataFile{content=data, file_path=/var/folders/5z/dqrlv_ts0wqf36vd39bb384h0000gn/T/junit17491575750166086656/9f77fae8-d62a-426d-971f-a342b6775c44/test_schema/test_table/data/00000-2-52ae94aa-b796-4c42-bf9c-92d36c52e522-00001.parquet, file_format=PARQUET, spec_id=0, partition=PartitionData{}, record_count=1, file_size_in_bytes=407, column_sizes=null, value_counts=org.apache.iceberg.util.SerializableMap@0, null_value_counts=org.apache.iceberg.util.SerializableMap@1, nan_value_counts=org.apache.iceberg.util.SerializableMap@0, lower_bounds=org.apache.iceberg.SerializableByteBufferMap@e1782, upper_bounds=org.apache.iceberg.SerializableByteBufferMap@e1782, key_metadata=null, split_offsets=[4], equality_ids=null, sort_order_id=null} > > at org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:50) > > at org.apache.iceberg.MergingSnapshotProducer.validateNoNewDeletesForDataFiles(MergingSnapshotProducer.java:418) > > at org.apache.iceberg.MergingSnapshotProducer.validateNoNewDeletesForDataFiles(MergingSnapshotProducer.java:367) > > at org.apache.iceberg.BaseRewriteFiles.validate(BaseRewriteFiles.java:108) > > at org.apache.iceberg.SnapshotProducer.apply(SnapshotProducer.java:175) > > at org.apache.iceberg.SnapshotProducer.lambda$commit$2(SnapshotProducer.java:296) > > at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:404) > > at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:214) > > at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:198) > > at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190) > > at org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:295) > > at org.apache.iceberg.actions.RewriteDataFilesCommitManager.commitFileGroups(RewriteDataFilesCommitManager.java:89) > > at org.apache.iceberg.actions.RewriteDataFilesCommitManager.commitOrClean(RewriteDataFilesCommitManager.java:110) > > at org.apache.iceberg.spark.actions.RewriteDataFilesSparkAction.doExecute(RewriteDataFilesSparkAction.java:291) > > ... 8 more > > > your process is in use-starting-sequence-number = true ? > > I test with use-starting-sequence-number = true and compact failed(apache iceberg1.4.3): > > Exception in thread "main" org.apache.iceberg.exceptions.ValidationException: Cannot commit, found new delete for replaced data file: GenericDataFile ... > > from above conversation it seem we get the validationException in both code-paths, isnt it? @RussellSpitzer @szehon-ho When I set use-starting-sequence-number = true, which is the default value, there were no validationException, but the equality delete files were lost. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org