CodingJun commented on issue #10312:
URL: https://github.com/apache/iceberg/issues/10312#issuecomment-2472793323

   > yea I think @RussellSpitzer is right, we should rely on validation error 
to prevent this scenario here, ie T1 should not be able to commit successfully.
   > 
   > i need to understand one thing:
   > 
   > > When I set use-starting-sequence-number = false for rewriteDataFiles, 
Thread 1 compact data files failed at t4. stacktrace:
   > > Caused by: org.apache.iceberg.exceptions.ValidationException: Cannot 
commit, found new delete for replaced data file: GenericDataFile{content=data, 
file_path=/var/folders/5z/dqrlv_ts0wqf36vd39bb384h0000gn/T/junit17491575750166086656/9f77fae8-d62a-426d-971f-a342b6775c44/test_schema/test_table/data/00000-2-52ae94aa-b796-4c42-bf9c-92d36c52e522-00001.parquet,
 file_format=PARQUET, spec_id=0, partition=PartitionData{}, record_count=1, 
file_size_in_bytes=407, column_sizes=null, 
value_counts=org.apache.iceberg.util.SerializableMap@0, 
null_value_counts=org.apache.iceberg.util.SerializableMap@1, 
nan_value_counts=org.apache.iceberg.util.SerializableMap@0, 
lower_bounds=org.apache.iceberg.SerializableByteBufferMap@e1782, 
upper_bounds=org.apache.iceberg.SerializableByteBufferMap@e1782, 
key_metadata=null, split_offsets=[4], equality_ids=null, sort_order_id=null}
   > > at 
org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:50)
   > > at 
org.apache.iceberg.MergingSnapshotProducer.validateNoNewDeletesForDataFiles(MergingSnapshotProducer.java:418)
   > > at 
org.apache.iceberg.MergingSnapshotProducer.validateNoNewDeletesForDataFiles(MergingSnapshotProducer.java:367)
   > > at 
org.apache.iceberg.BaseRewriteFiles.validate(BaseRewriteFiles.java:108)
   > > at org.apache.iceberg.SnapshotProducer.apply(SnapshotProducer.java:175)
   > > at 
org.apache.iceberg.SnapshotProducer.lambda$commit$2(SnapshotProducer.java:296)
   > > at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:404)
   > > at 
org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:214)
   > > at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:198)
   > > at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190)
   > > at org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:295)
   > > at 
org.apache.iceberg.actions.RewriteDataFilesCommitManager.commitFileGroups(RewriteDataFilesCommitManager.java:89)
   > > at 
org.apache.iceberg.actions.RewriteDataFilesCommitManager.commitOrClean(RewriteDataFilesCommitManager.java:110)
   > > at 
org.apache.iceberg.spark.actions.RewriteDataFilesSparkAction.doExecute(RewriteDataFilesSparkAction.java:291)
   > > ... 8 more
   > 
   > > your process is in use-starting-sequence-number = true ?
   > > I test with use-starting-sequence-number = true and compact 
failed(apache iceberg1.4.3):
   > > Exception in thread "main" 
org.apache.iceberg.exceptions.ValidationException: Cannot commit, found new 
delete for replaced data file: GenericDataFile ...
   > 
   > from above conversation it seem we get the validationException in both 
code-paths, isnt it?
   
   @RussellSpitzer @szehon-ho When I set use-starting-sequence-number = true, 
which is the default value, there were no  validationException, but the 
equality delete files were lost.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to