snleee opened a new issue, #11633:
URL: https://github.com/apache/pinot/issues/11633

   `pinotFS.listFiles()` can be expensive when we use remote file system (e.g. 
AWS s3) and the number of segments are very high (e.g. 100k). If the number of 
partition is high, it makes the problem even worse. We have seen the case where 
`pinotFS.listFiles()` are mostly consuming the threads available for API calls.
   
   
![image](https://github.com/apache/pinot/assets/27253407/3fa17a6e-067e-4060-94d5-734277ced826)
   
![image](https://github.com/apache/pinot/assets/27253407/bb6f032c-7de8-4bc3-accd-c1883a9e258a)
   
   Instead of fetching the full list and filter on `<segmentName>.tmp.` to 
detect the file to delete, we should keep the temp name and directly try to 
delete the file based on that.
   
   ```
   try {
     for (String uri : pinotFS.listFiles(tableDirURI, false)) {
       if 
(uri.contains(SegmentCompletionUtils.getSegmentNamePrefix(segmentName))) {
         LOGGER.warn("Deleting temporary segment file: {}", uri);
         Preconditions.checkState(pinotFS.delete(new URI(uri), true), "Failed 
to delete file: %s", uri);
       }
     }
   } catch (Exception e) {
     LOGGER.warn("Caught exception while deleting temporary segment files for 
segment: {}", segmentName, e);}
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to