Akanksha-kedia opened a new pull request, #17294: URL: https://github.com/apache/pinot/pull/17294
https://github.com/apache/pinot/pull/15473/commits/7cff2a4b105cfacbdce8da6934bdadeb3e17d789 ## Issue in PR #15473 ### What the PR Did (The Good Part) The PR added a new `deleteBatch()` method to delete multiple files at once instead of one by one. This is much faster, especially for cloud storage like S3. __Example__: Instead of deleting 100 files one at a time (100 API calls), you can delete them all together (1 API call). ### The Problems We Found (What Was Missing) #### Problem 1: Hadoop Filesystem Was Left Out - The PR added optimized `deleteBatch()` for __S3__ (Amazon's storage) - But __Hadoop filesystem__ (HDFS) was forgotten - it still deletes files one by one - This means Hadoop users don't get the performance improvement __Analogy__: It's like upgrading all cars to electric except the delivery trucks - they still run on old technology. #### Problem 2: No Safety Check for Missing Files - The default `deleteBatch()` in `BasePinotFS` tries to delete files without checking if they exist first - If a file is already deleted or doesn't exist, it throws an error and stops - This can break the deletion process -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
