Jackie-Jiang commented on code in PR #8812:
URL: https://github.com/apache/pinot/pull/8812#discussion_r890419704


##########
pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-standalone/src/main/java/org/apache/pinot/plugin/ingestion/batch/standalone/SegmentGenerationJobRunner.java:
##########
@@ -168,7 +173,6 @@ public void run()
     //Get list of files to process
     String[] files = _inputDirFS.listFiles(_inputDirURI, true);
 
-    //TODO: sort input files based on creation time

Review Comment:
   Let's keep this TODO



##########
pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-standalone/src/main/java/org/apache/pinot/plugin/ingestion/batch/standalone/SegmentGenerationJobRunner.java:
##########
@@ -160,6 +162,9 @@ public void init(SegmentGenerationJobSpec spec) {
     LOGGER.info("Creating an executor service with {} threads(Job parallelism: 
{}, available cores: {}.)", numThreads,
         jobParallelism, Runtime.getRuntime().availableProcessors());
     _executorService = Executors.newFixedThreadPool(numThreads);
+
+    // Set up for recording multiple failures while building segments.

Review Comment:
   (minor) The comment is a little bit confusing. Suggest updating it to 
reflect that we record the first failure



##########
pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-standalone/src/main/java/org/apache/pinot/plugin/ingestion/batch/standalone/SegmentGenerationJobRunner.java:
##########
@@ -253,6 +264,15 @@ private void submitSegmentGenTask(File localTempDir, URI 
inputFileURI, int seqId
     taskSpec.setFailOnEmptySegment(_spec.isFailOnEmptySegment());
     taskSpec.setCustomProperty(BatchConfigProperties.INPUT_DATA_FILE_URI_KEY, 
inputFileURI.toString());
 
+    // If there's already been a failure, log and skip this file. Do this 
check right before the
+    // submit to reduce odds of starting a new segment when a failure is 
recorded right before the
+    // submit.
+    if (_failure.get() != null) {
+      LOGGER.info("Skipping Segment Generation Task for {} due to previous 
failures", inputFileURI);
+      _segmentCreationTaskCountDownLatch.countDown();

Review Comment:
   (minor) This count down is not required because the previous failure should 
already drain it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to