fx19880617 commented on issue #6349:
URL: 
https://github.com/apache/incubator-pinot/issues/6349#issuecomment-749282776


   1. For the ingestion job, it's by design to keep the segments in output 
directory. The reason is that for URI and METADATA push job, the output dir is 
treated at the source of truth of the segment. E.g. users will use this job to 
generate segments and directly write into s3, then push metadata to Pinot for 
loading segments from the same s3 directory. 
   
   I think it's ok to add a config like `cleanUpOutputDir` to delete the output 
directory if the push mode is `TAR` and the default value should be false. 
   
   2. We usually expect the ingestion job output directory to be empty, but you 
are right, if there are segments already there or building in progress, then it 
will push them all. 
   
   To solve this I feel we can:
   - Merge segment generation and push into one task;
   - Let segment generation job return an array of generated tar file URIs
   - Push task will take the array and do the work.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to