szehon-ho commented on issue #10312: URL: https://github.com/apache/iceberg/issues/10312#issuecomment-2261085249
yea I think @RussellSpitzer is right, we should rely on validation error to prevent this scenario here, ie T1 should not be able to commit successfully. i need to understand one thing: > When I set use-starting-sequence-number = false for rewriteDataFiles, Thread 1 compact data files failed at t4. stacktrace: > > Caused by: org.apache.iceberg.exceptions.ValidationException: Cannot commit, found new delete for replaced data file: GenericDataFile{content=data, file_path=/var/folders/5z/dqrlv_ts0wqf36vd39bb384h0000gn/T/junit17491575750166086656/9f77fae8-d62a-426d-971f-a342b6775c44/test_schema/test_table/data/00000-2-52ae94aa-b796-4c42-bf9c-92d36c52e522-00001.parquet, file_format=PARQUET, spec_id=0, partition=PartitionData{}, record_count=1, file_size_in_bytes=407, column_sizes=null, value_counts=org.apache.iceberg.util.SerializableMap@0, null_value_counts=org.apache.iceberg.util.SerializableMap@1, nan_value_counts=org.apache.iceberg.util.SerializableMap@0, lower_bounds=org.apache.iceberg.SerializableByteBufferMap@e1782, upper_bounds=org.apache.iceberg.SerializableByteBufferMap@e1782, key_metadata=null, split_offsets=[4], equality_ids=null, sort_order_id=null} > at org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:50) > at org.apache.iceberg.MergingSnapshotProducer.validateNoNewDeletesForDataFiles(MergingSnapshotProducer.java:418) > at org.apache.iceberg.MergingSnapshotProducer.validateNoNewDeletesForDataFiles(MergingSnapshotProducer.java:367) > at org.apache.iceberg.BaseRewriteFiles.validate(BaseRewriteFiles.java:108) > at org.apache.iceberg.SnapshotProducer.apply(SnapshotProducer.java:175) > at org.apache.iceberg.SnapshotProducer.lambda$commit$2(SnapshotProducer.java:296) > at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:404) > at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:214) > at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:198) > at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190) > at org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:295) > at org.apache.iceberg.actions.RewriteDataFilesCommitManager.commitFileGroups(RewriteDataFilesCommitManager.java:89) > at org.apache.iceberg.actions.RewriteDataFilesCommitManager.commitOrClean(RewriteDataFilesCommitManager.java:110) > at org.apache.iceberg.spark.actions.RewriteDataFilesSparkAction.doExecute(RewriteDataFilesSparkAction.java:291) > ... 8 more > your process is in use-starting-sequence-number = true ? > I test with use-starting-sequence-number = true and compact failed(apache iceberg1.4.3): > Exception in thread "main" org.apache.iceberg.exceptions.ValidationException: Cannot commit, found new delete for replaced data file: GenericDataFile ... from above conversation it seem we get the validationException in both code-paths, isnt it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org