[ https://issues.apache.org/jira/browse/LUCENE-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325030#comment-17325030 ]
Michael McCandless commented on LUCENE-9694: -------------------------------------------- [~zhai7631] also fixed our {{luceneutil}} and our nightly benchmarks to use this new {{IndexRearranger}} class to create a deterministic index, instead of the crazy slow single-threaded, {{SerialMergeScheduler}} it was doing before, and it resulted in a massive speedup (~4.3X) in the "build the deterministic index for searching" step: [https://github.com/mikemccand/luceneutil/issues/117#issuecomment-822463187] Thank you [~zhai7631]! > New tool for creating a deterministic index > ------------------------------------------- > > Key: LUCENE-9694 > URL: https://issues.apache.org/jira/browse/LUCENE-9694 > Project: Lucene - Core > Issue Type: New Feature > Components: general/tools > Reporter: Haoyu Zhai > Priority: Minor > Fix For: main (9.0), 8.9 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Lucene's index is segmented, and sometimes number of segments and documents > arrangement greatly impact performance. > Given a stable index sort, our team create a tool that records document > arrangement (called index map) of an index and rearrange another index > (consists of same documents) into the same structure (segment num, and > documents included in each segment). > This tool could be also used in lucene benchmarks for a faster deterministic > index construction (if I understand correctly lucene benchmark is using a > single thread manner to achieve this). > > We've already had some discussion in email > [https://markmail.org/message/lbtdntclpnocmfuf] > And I've implemented the first method, using {{IndexWriter.addIndexes}} and a > customized {{FilteredCodecReader}} to achieve the goal. The index > construction time is about 25min and time executing this tool is about 10min. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org