maoli67660 opened a new pull request, #17094:
URL: https://github.com/apache/iceberg/pull/17094

   ## Problem
   
   `filteredCompareToFileList()` in `DeleteOrphanFilesSparkAction` filters the 
caller-provided `file_list_view` dataset using:
   
   ```java
   files = files.filter(files.col(FILE_PATH).startsWith(location));
   ```
   
   Because `location` has no trailing `/`, a sibling path that shares the table 
location as a raw string prefix is incorrectly included. For example, when the 
table is at `s3://bucket/my_table`, files under `s3://bucket/my_table_backup/` 
also satisfy `startsWith("s3://bucket/my_table")` and get pulled into orphan 
detection for the wrong table.
   
   Fixes #16493.
   
   ## Solution
   
   Append `/` to `location` before the prefix match:
   
   ```java
   String locationPrefix = location.endsWith("/") ? location : location + "/";
   files = files.filter(files.col(FILE_PATH).startsWith(locationPrefix));
   ```
   
   Applied to Spark 3.5, 4.0, and 4.1.
   
   ## Testing
   
   Added `testRemoveOrphanFilesFileListViewDoesNotMatchSiblingPaths` to 
`TestRemoveOrphanFilesProcedure` in all three Spark versions. The test:
   
   1. Creates an empty Iceberg table at a known location
   2. Builds a `file_list_view` that includes an orphan file inside the table 
directory **and** a file under a sibling path (`table-location + "-sibling"`)
   3. Runs `remove_orphan_files` with `file_list_view` and `dry_run => true`
   4. Asserts the sibling file is **not** identified as an orphan, and the 
in-table orphan **is** identified


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to