klsince opened a new pull request, #13677:
URL: https://github.com/apache/pinot/pull/13677

   Recently, we added support for consistent upsert view by synchronizing 
between updates on segments' validDocIds bitmaps and queries reading those 
bitmaps in PR #12976. But from recent tests, we find that's not enough. Other 
than that, we also need to track segments belonging to a table partition 
completely and consistently. 
   
   The current segment tracking logic for upsert view is broken in those places:
   1. (completeness) using `_trackedSegments` is not right to ensure the 
consistent upsert view, because this Set is updated after the segment is 
registered to table manager. So query can see a segment before it is included 
in the upsert view. If so, today, the query falls back to get segment's bitmap 
outside the locking logic used to ensure consistency;
   2. (completeness) when committing mutable segment, a new immutable segment 
is created to replace the mutable one, but the mutable segment is kept intact 
during segment replacement for query to access. The query can't access the new 
immutable segment as it's not registered until replacement is done. However, 
the upsert metadata gets updated (mainly the map from PK to record location) 
during segment replacement, so the newly ingested records start to invalidate 
docs in the new immutable segments, instead of the old mutable segment. This 
causes queries to see more than expected valid docs.
   3. (completeness) the server can start a consuming segment before the broker 
can add it to routing table, even with this early fix (PR # 11978). Because 
handling the change of IdealState on server and broker is not sync'ed and we 
don't want to sync brokers and servers anyway due to the cost and complexity.
   4. (consistency) when executing a query, the server acquires segments 
firstly, then get their validDocIds bitmaps. But between the two steps, new 
segments can get added, which can be a new consuming segment or a new immutable 
segment used to replace the mutable segment. And the query can't access the new 
segments' bitmap as not acquired them, thus getting less than expected valid 
docs.
   
   To address those issues
   1. A new Set is added to upsert partition mgr to track segments completely 
and consistently. A segment is always added to this Set before registering to 
table mgr. And server locks the Set when acquiring segments and getting their 
validDocIds bitmaps to ensure no segment membership changes.
   2. When committing a mutable segment, we create a new segment data mgr, 
called MultiSegmentDataManager, to register both the mutable and immutable 
segment to table mgr before segment replacement starts, so that query can 
acquire both. Meanwhile, we update mutable segment in place during replacement, 
instead of keeping it intact. In this way, we don't need to block the data 
ingestion to provide queries a complete data view when committing a segment.
   3. A query always tries to acquire the latest consuming segment, if it's not 
included in the broker's query request, so that the query doesn't miss the 
newly ingested docs, in case broker hasn't updated its routing table yet. 
   
   A few other misc fixes
   1. In mutable segment, we should update _numDocsIndexed before updating the 
upsert metadata, otherwise, query might see one less valid docs. 
   2. Return an empty bitmap instead of null when if a segment doesn't have 
bitmap yet, so that we don't fall back to get the bitmap again out side the 
locking logic, as when getting again, the segment might have created its bitmap.
   
   All the new changes in this PR are supposed to be enabled via the new 
feature flag upsertConfig.consistencyMode added in PR #12976, so upsert tables 
not using this feature shouldn't be affected.
   
   Still WIP, and more tests to be added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to