Akanksha-kedia opened a new pull request, #17294:
URL: https://github.com/apache/pinot/pull/17294

   
https://github.com/apache/pinot/pull/15473/commits/7cff2a4b105cfacbdce8da6934bdadeb3e17d789
   
   
   
   ## Issue in PR #15473
   
   ### What the PR Did (The Good Part)
   
   The PR added a new `deleteBatch()` method to delete multiple files at once 
instead of one by one. This is much faster, especially for cloud storage like 
S3.
   
   __Example__: Instead of deleting 100 files one at a time (100 API calls), 
you can delete them all together (1 API call).
   
   ### The Problems We Found (What Was Missing)
   
   #### Problem 1: Hadoop Filesystem Was Left Out
   
   - The PR added optimized `deleteBatch()` for __S3__ (Amazon's storage)
   - But __Hadoop filesystem__ (HDFS) was forgotten - it still deletes files 
one by one
   - This means Hadoop users don't get the performance improvement
   
   __Analogy__: It's like upgrading all cars to electric except the delivery 
trucks - they still run on old technology.
   
   #### Problem 2: No Safety Check for Missing Files
   
   - The default `deleteBatch()` in `BasePinotFS` tries to delete files without 
checking if they exist first
   - If a file is already deleted or doesn't exist, it throws an error and stops
   - This can break the deletion process
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to