snleee opened a new issue, #11633: URL: https://github.com/apache/pinot/issues/11633
`pinotFS.listFiles()` can be expensive when we use remote file system (e.g. AWS s3) and the number of segments are very high (e.g. 100k). If the number of partition is high, it makes the problem even worse. We have seen the case where `pinotFS.listFiles()` are mostly consuming the threads available for API calls.   Instead of fetching the full list and filter on `<segmentName>.tmp.` to detect the file to delete, we should keep the temp name and directly try to delete the file based on that. ``` try { for (String uri : pinotFS.listFiles(tableDirURI, false)) { if (uri.contains(SegmentCompletionUtils.getSegmentNamePrefix(segmentName))) { LOGGER.warn("Deleting temporary segment file: {}", uri); Preconditions.checkState(pinotFS.delete(new URI(uri), true), "Failed to delete file: %s", uri); } } } catch (Exception e) { LOGGER.warn("Caught exception while deleting temporary segment files for segment: {}", segmentName, e);} } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org