plaisted opened a new issue #6349:
URL: https://github.com/apache/incubator-pinot/issues/6349


   When running an ingestion job using the 'standalone' execution framework, 
the files written to 'outputDirURI' persist after the job completes. A couple 
issues arise from this:
   
   - This causes subsequent ingestion runs to add the left over files from 
previous runs in addition to the files for the current run
   - If concurrent jobs are running with the same storage location they would 
attempt to load each others files
   
   I haven't dug into the code but it seems like the job should:
   - clean up after itself
   - only load segments from the outPutDirURI that it created in the job
   
   If there are reasons why it shouldn't / can't do this, additional 
documentation on the behavior / purpose of the outputDirURI with standalone 
jobs would be helpful to callout the cleanup / URI uniqueness requirements.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to