[ https://issues.apache.org/jira/browse/LUCENE-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272291#comment-17272291 ]
Haoyu Zhai commented on LUCENE-9694: ------------------------------------ Oh I didn't include that since I want to keep it as generic as possible. But I guess I could add an example {{DocumentSelector}} as suggested by Mike in PR. > New tool for creating a deterministic index > ------------------------------------------- > > Key: LUCENE-9694 > URL: https://issues.apache.org/jira/browse/LUCENE-9694 > Project: Lucene - Core > Issue Type: New Feature > Components: general/tools > Reporter: Haoyu Zhai > Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > Lucene's index is segmented, and sometimes number of segments and documents > arrangement greatly impact performance. > Given a stable index sort, our team create a tool that records document > arrangement (called index map) of an index and rearrange another index > (consists of same documents) into the same structure (segment num, and > documents included in each segment). > This tool could be also used in lucene benchmarks for a faster deterministic > index construction (if I understand correctly lucene benchmark is using a > single thread manner to achieve this). > > We've already had some discussion in email > [https://markmail.org/message/lbtdntclpnocmfuf] > And I've implemented the first method, using {{IndexWriter.addIndexes}} and a > customized {{FilteredCodecReader}} to achieve the goal. The index > construction time is about 25min and time executing this tool is about 10min. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org