jackjlli commented on pull request #8098:
URL: https://github.com/apache/pinot/pull/8098#issuecomment-1026332309


   > What if user wants to replace an existing segment with a new generated 
one? We should allow overriding existing segment, but not the newly pushed ones.
   
   If user wants to replace an existing segment, he/she can still use the 
current logic to do that. I've update the logic of the PR to validate the 
number of input and output files. Since there is 1:1 mapping between the input 
and output files, if these two number doesn't match, we should fail the job. 
   The previous logic that sets `overwrite` flag doesn't work as the 
destination is just a temp dir for each of the mapper. The actual merge step 
from mapper temp dir to final output dir is done inside the `commitTask` method 
of `FileOutputCommitter` class, which is out of the scope of our MR job.
   
   Sample log:
   ```
   2022-01-31 21:49:15,627 INFO [main] 
org.apache.pinot.hadoop.job.mappers.HadoopSegmentCreationMapper: Copying 
segment tar file from: 
pinot_hadoop_tmp/segmentTar/jobAnalyticsV1LiteOfflineEvents_jobAnalyticsV1LiteOfflineEvents_daily_2022-01-25_2022-01-25.tar.gz
 to: 
hdfs://path1/pinot_segments/3f5420af-6422-4035-9d53-2dd1895c2747/output/_temporary/1/_temporary/attempt_1632281309592_18465840_m_000001_0/segmentTar/table1_2022-01-25_2022-01-25.tar.gz
   ...
   2022-01-31 21:49:17,484 INFO [main] 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of 
task 'attempt_1632281309592_18465840_m_000001_0' to 
hdfs://path1/pinot_segments/3f5420af-6422-4035-9d53-2dd1895c2747/output
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to