plaisted opened a new issue #6349: URL: https://github.com/apache/incubator-pinot/issues/6349
When running an ingestion job using the 'standalone' execution framework, the files written to 'outputDirURI' persist after the job completes. A couple issues arise from this: - This causes subsequent ingestion runs to add the left over files from previous runs in addition to the files for the current run - If concurrent jobs are running with the same storage location they would attempt to load each others files I haven't dug into the code but it seems like the job should: - clean up after itself - only load segments from the outPutDirURI that it created in the job If there are reasons why it shouldn't / can't do this, additional documentation on the behavior / purpose of the outputDirURI with standalone jobs would be helpful to callout the cleanup / URI uniqueness requirements. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org