klsince opened a new pull request, #11020:
URL: https://github.com/apache/pinot/pull/11020

   `feature`
   
   This PR adds a feature to preload segments from table that uses the upsert 
snapshot feature. The segments with validDocIds snapshots can be preloaded in a 
more efficient manner to speed up the table loading (i.e. server restarts). 
   
   Basically, the primary keys from all valid docs from all segments with 
validDocIds snapshots are unique, so we can simply put their primary keys into 
the upsert metadata map, w/o the doing the costly checks for duplicate primary 
keys. Once preloading is done, the remaining segments can be loaded as usual, 
i.e. check for duplicates and update validDocIds in the existing segments. This 
feature adds a synchronization logic between helix threads and preloading 
threads for correctness.
   
   ## Release Note ##
   1. added a new instance config: `max.segment.preload.threads` to configure 
how many bg threads to preload segments. The thread pool is shared across 
tables.
   2. added a table config (upsert config): `enablePreload`, whether to enable 
preloading.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to