singhpk234 commented on code in PR #12270: URL: https://github.com/apache/iceberg/pull/12270#discussion_r1961054056
########## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RemoveDanglingDeletesSparkAction.java: ########## @@ -156,7 +162,12 @@ private List<DeleteFile> findDanglingDeletes() { .or( col("data_file.content") .equalTo("2") - .and(col("sequence_number").$less$eq(col("min_data_sequence_number")))); + .and(col("sequence_number").$less$eq(col("min_data_sequence_number")))) + // dvs pointing to non-existing data files + .or( + col("data_file.file_format") + .equalTo(FileFormat.PUFFIN.name()) Review Comment: Apologies for the confusion, this comment was meant to be in the line below, essentially where we matching the data file path with the file path puffin is pointing to. can having an exact equality check lead to miss ? for ex consider in the table if file_path 's3://<tbl_location>/filea.parquet' exists but Puffin files point to 's3a://<tbl_location>/filea.parquet' since we do exact not eq check this case can be missed as only diff is S3 and S3a but the file is there ? Hence was recommending the above -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org