kkrugler edited a comment on pull request #7222:
URL: https://github.com/apache/pinot/pull/7222#issuecomment-888686908


   Hi @Jackie-Jiang - a few things about this WIP...
   
   1. The inputFilePath that I'm getting passed inside the 
`SegmentGeneratorConfig`, which is passed to `SegmentIndexCreationDriverImpl` 
(at least when running locally) is a path to a temp directory where the input 
file has been copied. So the real input path has been lost is that case, which 
means if the input file path pattern is matching anything other than the file 
name, it won't work. I see that 
`SegmentGenerationJobRunner.submitSegmentGenTask()` (in 
pinot-batch-ingestion-standalone) is where the input file gets copied to the 
local temp dir, I assume this is so that the input data can be read from a 
regular Java File vs. needing to use abstract FileSystem stuff everywhere. 
Wondering if it's worthwhile to at least try to replicate (say for 2-3 levels) 
the input file hierarchy inside of the temp input dir.
   2. I didn't want to hit the `pom.xml` files, but my build was failing 
without those changes to skip checks on Eclipse-generated files by rat and the 
apache license checker. I also had to do some dependency management to avoid a 
build failure due to pinot integration tests pulling in some local cloud test 
code which used a different version of AWS SDK jars. But I could try to 
separate those into a different PR.
   3. I noticed a few other bits I should clean up when reviewing the committed 
files (e.g. Javadoc for `InputFileSegmentNameGenerator`, cleaning up use of 
`@Nullable` in arguments). I've made those changes in my branch, just haven't 
pushed yet to update the PR.
   
   Anyway, looking for input on items 1 & 2 above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to