mapshen opened a new issue #7850:
URL: https://github.com/apache/pinot/issues/7850


   We run Pinot 0.8.0. When setting up a realtime table consuming from a 
single-partition kafka topic, we set `replicasPerPartition` to 2, which means 
there are two consuming segments, running on two separate pinot servers. 
However, when you take one server down, wait for a while and then bring it back 
up, your query could still hit either of the two consuming segment although one 
is lagging behind, hence leading to inconsistent/incorrect query results. As a 
user, we expect to see the query gets routes to the consuming segment that has 
newer data.
   
   Steps to reproduce:
   1. Set up a realtime table with `replicasPerPartition` set to 2. Also set 
`realtime.segment.flush.threshold.time` to something like `12h` to make sure 
there is no segment flush during the testing.
   2. Have the 2 replica segments, distributed on Host A and B respectively, 
consume for 5 minutes.
   3. Stop the pinot server on Host A. Wait for 5 minutes and then start it.
   4. Wait for both consuming segments to come back online.
   5. Run a PQL query like `select * from <table> order by <columnA> desc` that 
scans all segments in the UI repeatedly and you will see that the 
`numDocsScanned` alternates as the query gets routed to different consuming 
segments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to