klsince opened a new pull request, #13677: URL: https://github.com/apache/pinot/pull/13677
Recently, we added support for consistent upsert view by synchronizing between updates on segments' validDocIds bitmaps and queries reading those bitmaps in PR #12976. But from recent tests, we find that's not enough. Other than that, we also need to track segments belonging to a table partition completely and consistently. The current segment tracking logic for upsert view is broken in those places: 1. (completeness) using `_trackedSegments` is not right to ensure the consistent upsert view, because this Set is updated after the segment is registered to table manager. So query can see a segment before it is included in the upsert view. If so, today, the query falls back to get segment's bitmap outside the locking logic used to ensure consistency; 2. (completeness) when committing mutable segment, a new immutable segment is created to replace the mutable one, but the mutable segment is kept intact during segment replacement for query to access. The query can't access the new immutable segment as it's not registered until replacement is done. However, the upsert metadata gets updated (mainly the map from PK to record location) during segment replacement, so the newly ingested records start to invalidate docs in the new immutable segments, instead of the old mutable segment. This causes queries to see more than expected valid docs. 3. (completeness) the server can start a consuming segment before the broker can add it to routing table, even with this early fix (PR # 11978). Because handling the change of IdealState on server and broker is not sync'ed and we don't want to sync brokers and servers anyway due to the cost and complexity. 4. (consistency) when executing a query, the server acquires segments firstly, then get their validDocIds bitmaps. But between the two steps, new segments can get added, which can be a new consuming segment or a new immutable segment used to replace the mutable segment. And the query can't access the new segments' bitmap as not acquired them, thus getting less than expected valid docs. To address those issues 1. A new Set is added to upsert partition mgr to track segments completely and consistently. A segment is always added to this Set before registering to table mgr. And server locks the Set when acquiring segments and getting their validDocIds bitmaps to ensure no segment membership changes. 2. When committing a mutable segment, we create a new segment data mgr, called MultiSegmentDataManager, to register both the mutable and immutable segment to table mgr before segment replacement starts, so that query can acquire both. Meanwhile, we update mutable segment in place during replacement, instead of keeping it intact. In this way, we don't need to block the data ingestion to provide queries a complete data view when committing a segment. 3. A query always tries to acquire the latest consuming segment, if it's not included in the broker's query request, so that the query doesn't miss the newly ingested docs, in case broker hasn't updated its routing table yet. A few other misc fixes 1. In mutable segment, we should update _numDocsIndexed before updating the upsert metadata, otherwise, query might see one less valid docs. 2. Return an empty bitmap instead of null when if a segment doesn't have bitmap yet, so that we don't fall back to get the bitmap again out side the locking logic, as when getting again, the segment might have created its bitmap. All the new changes in this PR are supposed to be enabled via the new feature flag upsertConfig.consistencyMode added in PR #12976, so upsert tables not using this feature shouldn't be affected. Still WIP, and more tests to be added. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org