Re: [PR] Adding support for passing an ExecutorService into DirectoryReader.open() to enable concurrent segment reader initialization [lucene]

via GitHub Fri, 21 Nov 2025 06:14:47 -0800


dianjifzm commented on PR #15428:
URL: https://github.com/apache/lucene/pull/15428#issuecomment-3563197448


   Hello everyone,  
   
   I've also been looking forward to the parallel initialization capability for 
SegmentReader.  
   In fact, I've been anticipating this for several years. It's great to see 
the discussion sparked by the issues submitted by BryceKan3 today.  
   
   Let me explain the application scenario:  
   Normally, SegmentReader initialization is expected to be fast. However, 
since Lucene supports StoredFieldsFormat extensions,  
   we utilized StoredFieldsFormat to compress and store all forward column data 
in memory.  
   This move significantly improved IO performance (with about a 30% 
performance boost),  
   but it resulted in extremely slow SegmentReader initialization, taking as 
long as over 10 minutes.  
   This only affects the startup time and does not impact subsequent segment 
update mechanisms or other operations, which is why we've been using it for 
years.  
   
   Another suggestion: using `Executors.newFixedThreadPool` is not ideal.  
   It would be better to use `ForkJoinPool.commonPool()`, as the number of 
threads matches the number of CPUs, fully utilizing CPU performance and 
avoiding excessive context switching.  
   
   From my understanding, there's no need to consider `openIfChanged` or 
virtual threads.  
   SegmentReader only needs to be initialized in parallel during the startup 
phase. Once initialized, it no longer supports other asynchronous behaviors, 
which would minimize risks.  
   The simplest way is to use `Arrays.parallelSetAll` to initialize the 
`SegmentReader[] readers`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Adding support for passing an ExecutorService into DirectoryReader.open() to enable concurrent segment reader initialization [lucene]

Reply via email to