Jackie-Jiang commented on code in PR #8812: URL: https://github.com/apache/pinot/pull/8812#discussion_r890419704
########## pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-standalone/src/main/java/org/apache/pinot/plugin/ingestion/batch/standalone/SegmentGenerationJobRunner.java: ########## @@ -168,7 +173,6 @@ public void run() //Get list of files to process String[] files = _inputDirFS.listFiles(_inputDirURI, true); - //TODO: sort input files based on creation time Review Comment: Let's keep this TODO ########## pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-standalone/src/main/java/org/apache/pinot/plugin/ingestion/batch/standalone/SegmentGenerationJobRunner.java: ########## @@ -160,6 +162,9 @@ public void init(SegmentGenerationJobSpec spec) { LOGGER.info("Creating an executor service with {} threads(Job parallelism: {}, available cores: {}.)", numThreads, jobParallelism, Runtime.getRuntime().availableProcessors()); _executorService = Executors.newFixedThreadPool(numThreads); + + // Set up for recording multiple failures while building segments. Review Comment: (minor) The comment is a little bit confusing. Suggest updating it to reflect that we record the first failure ########## pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-standalone/src/main/java/org/apache/pinot/plugin/ingestion/batch/standalone/SegmentGenerationJobRunner.java: ########## @@ -253,6 +264,15 @@ private void submitSegmentGenTask(File localTempDir, URI inputFileURI, int seqId taskSpec.setFailOnEmptySegment(_spec.isFailOnEmptySegment()); taskSpec.setCustomProperty(BatchConfigProperties.INPUT_DATA_FILE_URI_KEY, inputFileURI.toString()); + // If there's already been a failure, log and skip this file. Do this check right before the + // submit to reduce odds of starting a new segment when a failure is recorded right before the + // submit. + if (_failure.get() != null) { + LOGGER.info("Skipping Segment Generation Task for {} due to previous failures", inputFileURI); + _segmentCreationTaskCountDownLatch.countDown(); Review Comment: (minor) This count down is not required because the previous failure should already drain it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org