mcvsubbu commented on issue #8492: URL: https://github.com/apache/pinot/issues/8492#issuecomment-1096995899
> Yes, @jackjlli and I had a discussion yesterday, and reached the same conclusion. Using some sort of an ID. The problem is that the threads taking up any of these activities can be in any order. [Of course, we may choose not to handle multiple simultaneous requests]. In this case, it is useful to know the _highest_ sequence number of messages handled. And then we need to consider restarts. If a server restarts, then we should treat it as if all messages are handled. Then we may enter into race conditions when a message may never be received (or discarded after receipt), etc. Not very straight-forward. A more complete solution will be to keep a mirror of the IDEALSTATE znode in zk (or, if Helix 1.x supports additional data along with idealstate). For example, `"segment_0": { "host1": {"ONLINE", "105"}, "host2":{"ONLINE", 106"}}` In this case, it indicates that host 1 processed message 105 whereas host2 processed 106. Unless helix 1.x supports an idealstate extension natively, this can lead to a new znode, something we have tried to avoid in the past. I suggest that we NOT implement this API. Instead, we consider the use cases. Most common cases: - operators may issue a single reload, and can use the segment status API to get the status - operators almost never will try to get statuses and verify that they are the same if no reload is issued. So, we either remove this from the test code, or just put a sleep. But let us not code for this to work right. - If operators indeed issue multiple reloads for the (segments of) same table, the code will work fine as long as the bug that has been found is fixed (the bug where we load the table config before grabbing the semaphore). Just fix the bug, and document it well (that if users have 1000 servers for a single table, and do a reload, the zookeeper server may be overloaded with 1000 requests coming in at the same time.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org