rajagopr commented on PR #13597:
URL: https://github.com/apache/pinot/pull/13597#issuecomment-2231578183

   > I suppose the scenario you are trying to optimize is when each minion 
uploads more than one segment (presumably two, to split data over a time 
boundary?) and hundreds of minions working on multiple such segments? If not, 
can you please explain the use case a bit more?
   > 
   > Also, can you use the start/end replace segments primitive that is already 
built?
   
   @mcvsubbu : Sharing additional context below.
   
   The SegmentRefreshTask (custom minion task) helps align Pinot segments to 
the table configs. The task can do things like: 1) Break down a large segment 
into multiple small segments 2) Combine multiple small segments into one large 
segment
   3) Sort the data based on the sort column – if sort column changed. 4) 
Re-partition the data etc. The task extends on the abstract class 
BaseMultipleSegmentsConversionExecutor which can operate on multiple segments. 
The task aligns the 
   Pinot segments to the table configs over multiple invocations. After each 
round of invocation, the segments (or subset of segments) would be a step 
closer to the table configuration.
   
   When we encountered issues, each minion task was operating on upto twenty 
segments, there were about ~50K segments in the table and about 200 minions 
were operating at the same time. Once the SegmentConversionResults are 
generated the minion attempts to upload the new segments and there is a 
configured timeout of about ten minutes to upload the segment. Upon closer 
inspection, we found that the contention is happening while updating the 
IdealState and the threads are waiting to acquire the table update lock. Since, 
many of the uploads timeout, this would result in a flood of retries which 
worsens the problem.
   
   Due to the contention issue described above we are unable to increase the 
number of minions and the overall SegmentRefreshTask would take multiple days 
to complete.
   
   **cc:** @swaminathanmanish (in-case you have more details to share)
   
   To the point of making use start/end replace segments, the 
BaseMultipleSegmentsConversionExecutor would already take care of it right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to