kkrugler edited a comment on pull request #7222: URL: https://github.com/apache/pinot/pull/7222#issuecomment-888686908
Hi @Jackie-Jiang - a few things about this WIP... 1. The inputFilePath that I'm getting passed inside the `SegmentGeneratorConfig`, which is passed to `SegmentIndexCreationDriverImpl` is a path to a local temp directory where the input file has been copied. So the real input path has been lost is that case, which means if the input file path pattern is matching anything other than the file name, it won't work. I see that `SegmentGenerationJobRunner.submitSegmentGenTask()` (in pinot-batch-ingestion-standalone) is where the input file gets copied to the local temp dir, I assume this is so that the input data can be read from a regular Java File vs. needing to use abstract FileSystem stuff everywhere. Wondering if it's worthwhile to at least try to replicate (say for 2-3 levels) the input file hierarchy inside of the temp input dir. Though this same change would need to be made for Hadoop & Spark segment generation code too. 2. I didn't want to hit the `pom.xml` files, but my build was failing without those changes to skip checks on Eclipse-generated files by rat and the apache license checker. I also had to do some dependency management to avoid a build failure due to pinot integration tests pulling in some local cloud test code which used a different version of AWS SDK jars. But I could try to separate those into a different PR. 3. I noticed a few other bits I should clean up when reviewing the committed files (e.g. Javadoc for `InputFileSegmentNameGenerator`, cleaning up use of `@Nullable` in arguments). I've made those changes in my branch, just haven't pushed yet to update the PR. Anyway, looking for input on items 1 & 2 above. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org