dianjifzm commented on PR #15428: URL: https://github.com/apache/lucene/pull/15428#issuecomment-3563197448
Hello everyone, I've also been looking forward to the parallel initialization capability for SegmentReader. In fact, I've been anticipating this for several years. It's great to see the discussion sparked by the issues submitted by BryceKan3 today. Let me explain the application scenario: Normally, SegmentReader initialization is expected to be fast. However, since Lucene supports StoredFieldsFormat extensions, we utilized StoredFieldsFormat to compress and store all forward column data in memory. This move significantly improved IO performance (with about a 30% performance boost), but it resulted in extremely slow SegmentReader initialization, taking as long as over 10 minutes. This only affects the startup time and does not impact subsequent segment update mechanisms or other operations, which is why we've been using it for years. Another suggestion: using `Executors.newFixedThreadPool` is not ideal. It would be better to use `ForkJoinPool.commonPool()`, as the number of threads matches the number of CPUs, fully utilizing CPU performance and avoiding excessive context switching. From my understanding, there's no need to consider `openIfChanged` or virtual threads. SegmentReader only needs to be initialized in parallel during the startup phase. Once initialized, it no longer supports other asynchronous behaviors, which would minimize risks. The simplest way is to use `Arrays.parallelSetAll` to initialize the `SegmentReader[] readers`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
