amrishlal commented on issue #7004: URL: https://github.com/apache/pinot/issues/7004#issuecomment-886098400
> You might have also noticed that the case where offset > last_recorded_offset assumes a single-partition topic. The usecase for which `offset > last_recorded_offset` would work properly appears to be quite narrow (?). I think we should look at into coming up with a more generic solution that would work over all Kafka partitions, other streaming solutions (Kenesis for example), and segments that are generated offline as well (a virtual id could be added to each row or segment as the segment is added into Pinot). Maintaining a global monotonically increasing counter would be difficult and inefficient to do, but an approximate monotonically increasing counter across all instances should not be that difficult or inefficient and would allow for producing "good enough" results for the usecase (internal application polls Pinot at regular intervals to fetch updates) and results could be further fine tuned by doing something like comparing hash of old rows vs new rows while polling the resultset for more results. FWIW, I would vote for a holistic solution that is "good enough" rather than a "perfect" solution that would work only for a single partition Kafka stream. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org