abhishekbafna commented on code in PR #16249: URL: https://github.com/apache/pinot/pull/16249#discussion_r2185134422
########## pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/BaseMultipleSegmentsConversionExecutor.java: ########## @@ -352,6 +348,67 @@ public List<SegmentConversionResult> executeTask(PinotTaskConfig pinotTaskConfig } } + private void parallelDownloadAndUntarSegments(int nThreads, String tableNameWithType, String taskType, + String[] segmentNames, String[] downloadURLs, File tempDataDir, AtomicInteger recordCounter, + List<File> inputSegmentDirs) + throws Exception { + + ExecutorService executorService = null; + int length = downloadURLs.length; + try { + executorService = Executors.newFixedThreadPool(nThreads); + List<Future<Void>> futures = new ArrayList<>(length); + for (int i = 0; i < length; i++) { + int index = i; + futures.add(executorService.submit(() -> { + downloadAndUntarSegment(tableNameWithType, taskType, segmentNames[index], downloadURLs[index], + tempDataDir, index, recordCounter, inputSegmentDirs); + return null; + })); + + // Wait for all downloads to complete and cancel other tasks if any download fails + for (Future<Void> future : futures) { + try { + future.get(); + } catch (Exception e) { + // Cancel all other download tasks + for (Future<Void> f : futures) { + f.cancel(true); + } + throw e; + } + } + } + } finally { + if (executorService != null) { + executorService.shutdown(); + } + } + } + + private void downloadAndUntarSegment(String tableNameWithType, String taskType, + String segmentName, String downloadURL, File tempDataDir, int index, AtomicInteger recordCounter, + List<File> inputSegmentDirs) + throws Exception { + try { + _eventObserver.notifyProgress(_pinotTaskConfig, "Downloading and decompressing segment from: " + downloadURL + + " (" + (index + 1) + " out of " + inputSegmentDirs.size() + ")"); + // Download and decompress the segment file + File indexDir = downloadSegmentToLocalAndUntar(tableNameWithType, segmentName, downloadURL, taskType, + tempDataDir, "_" + index); + reportSegmentDownloadMetrics(indexDir, tableNameWithType, taskType); + SegmentMetadata segmentMetadata = new SegmentMetadataImpl(indexDir); + // Ensure segment directory placement is at same index as the segment name in the inputSegmentNames + inputSegmentDirs.set(index, indexDir); + recordCounter.addAndGet(segmentMetadata.getTotalDocs()); + } catch (Exception e) { Review Comment: I think, this is better. This keep the logic together and ensure that `inputSegmentDirs` and `recordCounter` only happens when everything above succeed. The `new SegmentMetadataImpl(indexDir);` also throes IOException and other. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org