jadami10 commented on issue #15897:
URL: https://github.com/apache/pinot/issues/15897#issuecomment-3349388631

   @sajjad-moradi , we ran into an incident related to this internally. 
   
   We have a fairly custom and complicated ingestion plugin predating the OSS 
version that allows consumption from multiple kafka clusters. Our bug 
manifested when the controllers recognized new kafka clusters/partitions that 
the servers were unable to consume from yet. This caused those CONSUMING 
segments to go OFFLINE, and we proceeded to fail all queries since they 
indicated partial results.
   
   The way we've remediated this internally is to
   1. catch all exceptions when creating the partition group consumer
   2. fallback to an "empty consumer" that never returns any data but also 
never advances the offset
   3. emit a metric so our team can be alerted and investigate
   
   My thinking is should:
   1. this also be the default behavior in OSS? is there any case where 
indicating partial results for missing the latest data is preferred over just 
lagging
   2. or does it make more sense to update Pinot to indicate that missing data 
is strictly from consuming segments and let clients decide
   3. both
   
   notably, in order to recover this incident, we also had to force commit in 
order to get servers to re-attempt consumption.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to