klsince opened a new pull request, #11020: URL: https://github.com/apache/pinot/pull/11020
`feature` This PR adds a feature to preload segments from table that uses the upsert snapshot feature. The segments with validDocIds snapshots can be preloaded in a more efficient manner to speed up the table loading (i.e. server restarts). Basically, the primary keys from all valid docs from all segments with validDocIds snapshots are unique, so we can simply put their primary keys into the upsert metadata map, w/o the doing the costly checks for duplicate primary keys. Once preloading is done, the remaining segments can be loaded as usual, i.e. check for duplicates and update validDocIds in the existing segments. This feature adds a synchronization logic between helix threads and preloading threads for correctness. ## Release Note ## 1. added a new instance config: `max.segment.preload.threads` to configure how many bg threads to preload segments. The thread pool is shared across tables. 2. added a table config (upsert config): `enablePreload`, whether to enable preloading. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org