wombatu-kun opened a new pull request, #16498:
URL: https://github.com/apache/iceberg/pull/16498

   ## Summary
   
   Closes #16493.
   
   DeleteOrphanFilesSparkAction.filteredCompareToFileList() previously scoped a 
user-supplied compareToFileList to the action's location field using a raw 
files.col(FILE_PATH).startsWith(location) filter. When location lacks a 
trailing path separator — the production-typical shape for storage URIs like 
s3://bucket/table returned by Table.location() — that filter also accepts 
sibling paths such as s3://bucket/table-backup/.... Files in those sibling 
directories then entered the orphan candidate set and could be deleted.
   
   This PR normalizes the prefix to directory form via 
`LocationUtil.stripTrailingSlash(location) + "/"` before the startsWith filter. 
The same `+ "/"` shape is already used in SnapshotTableSparkAction (lines 
131-132) to prevent identical sibling-prefix collisions, so this aligns the 
orphan-files action with that existing precedent. The fix is applied 
symmetrically to all three currently supported Spark version trees (v3.5, v4.0, 
v4.1) — their source files were byte-identical for this method, so the patch is 
mechanical.
   
   The directory-listing path (listedFileDS()) is unaffected: it uses Hadoop's 
FileSystem.listStatus from a single root, which is inherently bounded to that 
directory.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to