kkrugler opened a new issue #8141: URL: https://github.com/apache/pinot/issues/8141
We do daily builds of offline segments using Hadoop, and then store the results in HDFS in the directory that is configured as our Pinot cluster’s deep store. Our build generates 35 new (or more typically updated) per-month segments each day, which we then deploy to our Pinot cluster via a metadata push. What this means is that we’ve got a deep store directory in HDFS with ≈ 1200 segments (representing 3 years of data) for a table. When we do the metadata push every segment is downloaded, metadata is extracted, and that metadata tarball is sent to the controller. This takes about 3 hours currently. But we only want to send the 35 new segments. It seems like a simple solution would be to support a new, optional `pushFileNamePattern` parameter in the job conf, which could be used to filter down to only the segments we care about. The format could be the same as the existing `includeFileNamePattern` pattern. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org