[GitHub] [pinot] kkrugler opened a new issue #8141: Support pushFileNamePattern job conf option

GitBox Sat, 05 Feb 2022 16:35:15 -0800


kkrugler opened a new issue #8141:
URL: https://github.com/apache/pinot/issues/8141



   We do daily builds of offline segments using Hadoop, and then store the 
results in HDFS in the directory that is configured as our Pinot cluster’s deep 
store. Our build generates 35 new (or more typically updated) per-month 
segments each day, which we then deploy to our Pinot cluster via a metadata 
push.
   
   What this means is that we’ve got a deep store directory in HDFS with ≈ 1200 
segments (representing 3 years of data) for a table. When we do the metadata 
push every segment is downloaded, metadata is extracted, and that metadata 
tarball is sent to the controller. This takes about 3 hours currently. But we 
only want to send the 35 new segments.
   
   It seems like a simple solution would be to support a new, optional 
`pushFileNamePattern` parameter in the job conf, which could be used to filter 
down to only the segments we care about. The format could be the same as the 
existing `includeFileNamePattern` pattern.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [pinot] kkrugler opened a new issue #8141: Support pushFileNamePattern job conf option

Reply via email to