neon-hippo commented on issue #16493:
URL: https://github.com/apache/iceberg/issues/16493#issuecomment-4515504078

   Before submitting a fix, I wanted to raise a design question. The same 
problem — "is this file path under the table location?" — is solved two 
different ways:
   
    - listedFileDS() uses filesystem APIs (Hadoop listStatus, FileSystemWalker) 
which natively respect directory boundaries
    - filteredCompareToFileList() uses raw String.startsWith(location) which 
doesn't
   
   Is there a reason filteredCompareToFileList() can't use the same 
filesystem-based approach instead of string matching? That would remove the bug 
entirely and reduce the two implementations to one. If there's a deliberate 
reason for the string-based path (performance on S3 file lists?), I'd like to 
understand it before attempting a fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to