amrishlal commented on issue #7004:
URL: https://github.com/apache/pinot/issues/7004#issuecomment-886098400


   > You might have also noticed that the case where offset > 
last_recorded_offset assumes a single-partition topic.
   
   The usecase for which `offset > last_recorded_offset` would work properly 
appears to be quite narrow (?). I think we should look at into coming up with a 
more generic solution that would work over all Kafka partitions, other 
streaming solutions (Kenesis for example), and segments that are generated 
offline as well (a virtual id could be added to each row or segment as the 
segment is added into Pinot).
   
   Maintaining a global monotonically increasing counter would be difficult and 
inefficient to do, but an approximate monotonically increasing counter across 
all instances should not be that difficult or inefficient and would allow for 
producing "good enough" results for the usecase (internal application polls 
Pinot at regular intervals to fetch updates) and results could be further fine 
tuned by doing something like comparing hash of old rows vs new rows while 
polling the resultset for more results.
   
   FWIW, I would vote for a holistic solution that is "good enough" rather than 
a "perfect" solution that would work only for a single partition Kafka stream.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to