jackjlli commented on pull request #8098: URL: https://github.com/apache/pinot/pull/8098#issuecomment-1026332309
> What if user wants to replace an existing segment with a new generated one? We should allow overriding existing segment, but not the newly pushed ones. If user wants to replace an existing segment, he/she can still use the current logic to do that. I've update the logic of the PR to validate the number of input and output files. Since there is 1:1 mapping between the input and output files, if these two number doesn't match, we should fail the job. The previous logic that sets `overwrite` flag doesn't work as the destination is just a temp dir for each of the mapper. The actual merge step from mapper temp dir to final output dir is done inside the `commitTask` method of `FileOutputCommitter` class, which is out of the scope of our MR job. Sample log: ``` 2022-01-31 21:49:15,627 INFO [main] org.apache.pinot.hadoop.job.mappers.HadoopSegmentCreationMapper: Copying segment tar file from: pinot_hadoop_tmp/segmentTar/jobAnalyticsV1LiteOfflineEvents_jobAnalyticsV1LiteOfflineEvents_daily_2022-01-25_2022-01-25.tar.gz to: hdfs://path1/pinot_segments/3f5420af-6422-4035-9d53-2dd1895c2747/output/_temporary/1/_temporary/attempt_1632281309592_18465840_m_000001_0/segmentTar/table1_2022-01-25_2022-01-25.tar.gz ... 2022-01-31 21:49:17,484 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of task 'attempt_1632281309592_18465840_m_000001_0' to hdfs://path1/pinot_segments/3f5420af-6422-4035-9d53-2dd1895c2747/output ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org