jadami10 commented on issue #15897: URL: https://github.com/apache/pinot/issues/15897#issuecomment-3349388631
@sajjad-moradi , we ran into an incident related to this internally. We have a fairly custom and complicated ingestion plugin predating the OSS version that allows consumption from multiple kafka clusters. Our bug manifested when the controllers recognized new kafka clusters/partitions that the servers were unable to consume from yet. This caused those CONSUMING segments to go OFFLINE, and we proceeded to fail all queries since they indicated partial results. The way we've remediated this internally is to 1. catch all exceptions when creating the partition group consumer 2. fallback to an "empty consumer" that never returns any data but also never advances the offset 3. emit a metric so our team can be alerted and investigate My thinking is should: 1. this also be the default behavior in OSS? is there any case where indicating partial results for missing the latest data is preferred over just lagging 2. or does it make more sense to update Pinot to indicate that missing data is strictly from consuming segments and let clients decide 3. both notably, in order to recover this incident, we also had to force commit in order to get servers to re-attempt consumption. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
