mcvsubbu opened a new issue #7741:
URL: https://github.com/apache/pinot/issues/7741


   This issue happens  on tables that are configured with an offset criteria of 
anything other than SMALLEST.
   
   Tables are often provisioned with offset criteria set to LARGEST (basically, 
ignore earlier offsets and consume only from the latest messages). This is done 
so that we don't have to consume older data from a stream, only to discard all 
the data consumed so far since they are too old. Other possible criteria are 
CUSTOM or TIME period based. 
   
   Pinot has a periodic task (RealtimeSegmentValidationManager) that 
periodically scans the stream for new partitions and starts consumers for the 
new partitions detected. It is possible (and most likely the case) that the new 
partitions were created in between two runs of 
RealtimeSegmentValidationManager, and that the new partitions already have some 
data in them.
   
   In such cases, for the newer partitions that appeared, pinot will ignore the 
first some messages, and will consume after applying the offset criteria 
specified in table config. 
   
   This was introduced in PR #4695
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to