rajagopr commented on PR #13597: URL: https://github.com/apache/pinot/pull/13597#issuecomment-2231578183
> I suppose the scenario you are trying to optimize is when each minion uploads more than one segment (presumably two, to split data over a time boundary?) and hundreds of minions working on multiple such segments? If not, can you please explain the use case a bit more? > > Also, can you use the start/end replace segments primitive that is already built? @mcvsubbu : Sharing additional context below. The SegmentRefreshTask (custom minion task) helps align Pinot segments to the table configs. The task can do things like: 1) Break down a large segment into multiple small segments 2) Combine multiple small segments into one large segment 3) Sort the data based on the sort column – if sort column changed. 4) Re-partition the data etc. The task extends on the abstract class BaseMultipleSegmentsConversionExecutor which can operate on multiple segments. The task aligns the Pinot segments to the table configs over multiple invocations. After each round of invocation, the segments (or subset of segments) would be a step closer to the table configuration. When we encountered issues, each minion task was operating on upto twenty segments, there were about ~50K segments in the table and about 200 minions were operating at the same time. Once the SegmentConversionResults are generated the minion attempts to upload the new segments and there is a configured timeout of about ten minutes to upload the segment. Upon closer inspection, we found that the contention is happening while updating the IdealState and the threads are waiting to acquire the table update lock. Since, many of the uploads timeout, this would result in a flood of retries which worsens the problem. Due to the contention issue described above we are unable to increase the number of minions and the overall SegmentRefreshTask would take multiple days to complete. **cc:** @swaminathanmanish (in-case you have more details to share) To the point of making use start/end replace segments, the BaseMultipleSegmentsConversionExecutor would already take care of it right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org