Jackie-Jiang commented on issue #7849:
URL: https://github.com/apache/pinot/issues/7849#issuecomment-1062417937


   @tuor713 Within a single streaming partition, there will be up to one 
consuming segment at the same time. The small inconsistency is caused by the 
consuming segment replacing the doc from the completed segments, and the 
`validDocIds` from the segments are not read at the same time, e.g.:
   1. Query engine reads `validDocIds` from the completed segment (created a 
copy)
   2. Consuming segment invalidates one doc from the completed segment (not 
visible to the query engine because a copy/snapshot is already made), and mark 
the doc as valid in its own `validDocIds`
   3. Query engine reads `validDocIds` from the consuming segment
   4. The same doc will be double counted
   
   In order to solve this problem, we need to make global sync - take a 
snapshot of all queried segments while blocking the ingestion (as shown in the 
fix above). The solution works, and we can avoid creating the extra 
`IndexSegment` snapshot objects by snapshotting the `validDocIds` within the 
`FilterPlanNode`, but it can cause starvation between query and ingestion. For 
high QPS use case, the query can block each other, and also the ingestion.
   
   We can make it configurable for use cases that requires 100% consistency, 
but 100% consistency is usually not necessary for analytical purpose. 
Essentially it is a trade-off between consistency and performance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to